darg: Generate Priors for GP correlation

Description

Generate empirical Bayes regularization (priors) and choose initial values and ranges for (isotropic) lengthscale and nugget parameters to a Gaussian correlation function for a GP regression model

Usage

darg(d, X, samp.size = 1000)
garg(g, y)

Value

Both functions return a list containing the following entries. If the input object (d or g) specifies one of the values then that value is copied to the same list entry on output. See the Details section for how these values are calculated

mle: by default, TRUE for darg and FALSE for garg
start: starting value chosen from the quantiles of distance(X) or (y - mean(y))^2
min: minimum value in allowable range for the parameter - for future inference purposes
max: maximum value in allowable range for the parameter - for future inference purposes
ab: shape and rate parameters specifying a Gamma prior for the parameter

Arguments

d: can be NULL, or a scalar indicating an initial value or a partial list whose format matches the one described in the Value section below
g: can be NULL, or a scalar indicating an initial value or a partial list whose format matches the one described in the Value section below
X: a matrix or data.frame containing the full (large) design matrix of input locations
y: a vector of responses/dependent values
samp.size: a scalar integer indicating a subset size of X to use for calculations; this is important for very large X matrices since the calculations are quadratic in nrow(X)

Author

Robert B. Gramacy rbg@vt.edu

Details

These functions use aspects of the data, either X or y, to form weakly informative default priors and choose initial values for a lengthscale and nugget parameter. This is useful since the likelihood can sometimes be very flat, and even with proper priors inference can be sensitive to the specification of those priors and any initial search values. The focus here is on avoiding pathologies while otherwise remaining true to the spirit of MLE calculation.

darg output specifies MLE inference (out$mle = TRUE) by default, whereas garg instead fixes the nugget at the starting value, which may be sensible for emulating deterministic computer simulation data; when out$mle = FALSE the calculated range outputs c(out$min, out$max) are set to dummy values that are ignored in other parts of the laGP package.

darg calculates a Gaussian distance matrix between all pairs of X rows, or a subsample of rows of size samp.size. From those distances it chooses the range and start values from the range of (non-zero) distances and the 0.1 quantile, respectively. The Gamma prior values have a shape of out$a = 3/2 and a rate out$b chosen by the incomplete Gamma inverse function to put 0.95 probability below out$max.

garg is similar except that it works with (y- mean(y))^2 instead of the pairwise distances of darg. The only difference is that the starting value is chosen as the 2.5% quantile.

Examples

Run this code

## motorcycle data
if(require("MASS")) {
  X <- matrix(mcycle[,1], ncol=1)
  Z <- mcycle[,2]

  ## get darg and garg
  darg(NULL, X)
  garg(list(mle=TRUE), Z)
}

Run the code above in your browser using DataLab