The seven functions described below implement Bayesian regression models of varying complexity: linear model, linear CART, Gaussian process (GP), GP with jumps to the limiting linear model (LLM), treed GP, and treed GP LLM.
blm(X, Z, XX = NULL, meanfn = "linear", bprior = "bflat",
BTE = c(1000, 4000, 3), R = 1, m0r1 = TRUE, itemps = NULL,
pred.n = TRUE, krige = TRUE, zcov = FALSE, Ds2x = FALSE,
improv = FALSE, sens.p = NULL, trace = FALSE, verb = 1, ...)
btlm(X, Z, XX = NULL, meanfn = "linear", bprior = "bflat",
tree = c(0.5, 2), BTE = c(2000, 7000, 2), R = 1, m0r1 = TRUE,
itemps = NULL, pred.n = TRUE, krige = TRUE, zcov = FALSE,
Ds2x = FALSE, improv = FALSE, sens.p = NULL, trace = FALSE,
verb = 1, ...)
bcart(X, Z, XX = NULL, bprior = "bflat", tree = c(0.5, 2),
BTE = c(2000, 7000, 2), R = 1, m0r1 = TRUE, itemps = NULL,
pred.n = TRUE, krige = TRUE, zcov = FALSE, Ds2x = FALSE,
improv=FALSE, sens.p = NULL, trace = FALSE, verb = 1, ...)
bgp(X, Z, XX = NULL, meanfn = "linear", bprior = "bflat",
corr = "expsep", BTE = c(1000, 4000, 2), R = 1, m0r1 = TRUE,
itemps = NULL, pred.n = TRUE, krige = TRUE, zcov = FALSE,
Ds2x = FALSE, improv = FALSE, sens.p = NULL, nu = 1.5,
trace = FALSE, verb = 1, ...)
bgpllm(X, Z, XX = NULL, meanfn = "linear", bprior = "bflat",
corr = "expsep", gamma=c(10,0.2,0.7), BTE = c(1000, 4000, 2),
R = 1, m0r1 = TRUE, itemps = NULL, pred.n = TRUE,
krige = TRUE, zcov = FALSE, Ds2x = FALSE, improv = FALSE,
sens.p = NULL, nu = 1.5, trace = FALSE, verb = 1, ...)
btgp(X, Z, XX = NULL, meanfn = "linear", bprior = "bflat",
corr = "expsep", tree = c(0.5, 2), BTE = c(2000, 7000, 2),
R = 1, m0r1 = TRUE, linburn = FALSE, itemps = NULL,
pred.n = TRUE, krige = TRUE, zcov = FALSE, Ds2x = FALSE,
improv = FALSE, sens.p = NULL, nu = 1.5, trace = FALSE,
verb = 1, ...)
btgpllm(X, Z, XX = NULL, meanfn = "linear", bprior = "bflat",
corr = "expsep", tree = c(0.5, 2), gamma=c(10,0.2,0.7),
BTE = c(2000, 7000, 2), R = 1, m0r1 = TRUE, linburn = FALSE,
itemps = NULL, pred.n = TRUE, krige = TRUE, zcov = FALSE,
Ds2x = FALSE, improv = FALSE, sens.p = NULL, nu = 1.5,
trace = FALSE, verb = 1, ...)
bgp
returns an object of class "tgp"
.
The function plot.tgp
can be used to help visualize results.
An object of class "tgp"
is a list containing at least the
following components... The parts
output is unique to the T
(tree) category functions. Tree viewing is supported by
tgp.trees
Input argument: data.frame
of inputs X
Number of rows in X
, i.e., nrow(X)
Number of cols in X
, i.e., ncol(X)
Vector of output responses Z
Input argument: data.frame
of predictive locations XX
Number of rows in XX
, i.e., nrow(XX)
Input argument: Monte-carlo parameters
Input argument: restarts
Input argument: initialize MCMC with linear CART
list
of model parameters generated by
tgp.default.params
and subsequently modified according
to the calling b*
function and its arguments
Double-representation of model input parameters used by the C-code
data.frame
containing the importance tempering
ladders and pseudo-prior: $k
has inverse
inverse temperatures (from the input argument), $k
has an
updated pseudo-prior based on observation
counts and (possibly) stochastic approximation during burn-in
and (input) stochastic approximation parameters \(c_0\) and
\(n_0\). See default.itemps
for more info
Vector of mean predictive estimates at X
locations
Vector of 5% predictive quantiles at X
locations
Vector of 95% predictive quantiles at X
locations
Vector of quantile norms Zp.q2-Zp.q1
If input zcov = TRUE
, then this is a predictive
covariance matrix for the inputs at locations X
; otherwise
then this is a vector of predictive variances at the X
locations (diagonal of the predictive covariance matrix). Only
appears when input pred.n = TRUE
Vector of (expected) kriging means at X
locations
Vector of posterior variance for kriging surface (no additive noise) at X
locations
Vector of (expected) predictive kriging variances at X
locations
Vector of mean predictive estimates at XX
locations
Vector of 5% predictive quantiles at XX
locations
Vector of 95% predictive quantiles at XX
locations
Vector of quantile norms ZZ.q2-ZZ.q1
, used by the
ALM adaptive sampling algorithm
If input zcov = TRUE
, then this is a predictive
covariance matrix for predictive locations XX
; otherwise
then this is a vector of predictive variances at the XX
locations (diagonal of the predictive covariance matrix). Only
appears when input XX != NULL
If input zcov = TRUE
, then this is a predictive
n * nn
covariance matrix between locations in X
and
XX
; Only appears when zcov = TRUE
and both
pred.n = TRUE
and XX != NULL
Vector of (expected) kriging means at XX
locations
Vector of posterior variance for kriging surface (no additive noise) at XX
locations
Vector of (expected) predictive kriging variances at XX
locations
If argument Ds2x=TRUE
, this vector contains ALC
statistics for XX
locations
If argument improv
is TRUE
or a
positive integer, this is a 'matrix' with first column set to the expected
improvement statistics for XX
locations, and the second
column set to a ranking in the order that they should be sampled to
minimize the expected multivariate improvement raised to a power
determined by the argument improv
Name of response Z
if supplied by data.frame
in argument, or "z" if none provided
Internal representation of the regions depicted by partitions of the maximum a' posteriori (MAP) tree
list
of trees (maptree representation) which
were MAP as a function
of each tree height sampled between MCMC rounds B
and
T
If trace==TRUE
, this list
contains traces of most of the model parameters and posterior
predictive distributions at input locations
XX
. Otherwise the entry is FALSE
. See note below
Importance tempering effective sample size (ESS). If
itemps==NULL
this corresponds to the total number of
samples collected, i.e..
R*(BTE[2]-BTE[1])/BTE[3]
.
Otherwise the ESS will be lower due to a non-zero coefficient of variation of the calculated importance tempering weights
See sens
documentation for more details
Each of the above functions takes some subset of the following arguments...
data.frame
, matrix
, or vector of inputs X
Vector of output responses Z
of length equal to the
leading dimension (rows) of X
, i.e., length(Z) == nrow(X)
Optional data.frame
, matrix
,
or vector of predictive input locations
with the same number of columns as X
, i.e.,
ncol(XX) == ncol(X)
A choice of mean function for the process. When
meanfn = "linear"
(default), then we have the process
$$Z = (\mathbf{1} \;\; \mathbf{X}) \beta + W(\mathbf{X})$$
where \(W(\mathbf{X})\) represents the Gaussian process
part of the model (if present). Otherwise, when
meanfn = "constant"
, then $$Z = \beta_0 + W(\mathbf{X})$$
Linear (beta) prior, default is "bflat"
;
alternates include "b0"
hierarchical Normal prior,
"bmle"
empirical Bayes Normal prior, "b0not"
Bayesian
treed LM-style prior from Chipman et al. (same as "b0"
but
without tau2
), "bmzt"
a independent Normal
prior (mean zero) with inverse-gamma variance (tau2
),
and "bmznot"
is the same as "bmznot"
without tau2
.
The default "bflat"
gives
an “improper” prior which can perform badly when the
signal-to-noise ratio is low. In these cases the “proper” hierarchical
specification "b0"
or independent "bmzt"
or "bmznot"
priors may perform better
a 2-vector containing the tree process prior parameterization
c(alpha, beta)
specifying
$$p_{\mbox{\tiny split}}(\eta, \mathcal{T}) =
\alpha*(1+\eta)^\beta$$
automatically giving zero probability to trees
with partitions containing less than min(c(10,nrow(X)+1))
data points. You may also specify a longer vector, writing over
more of the components of the $tree
output from tgp.default.params
Limiting linear model parameters c(g, t1, t2)
,
with growth parameter g > 0
minimum parameter t1 >= 0
and maximum parameter t1 >= 0
, where
t1 + t2 <= 1
specifies
$$p(b|d)=t_1 +\exp\left\{\frac{-g(t_2-t_1)}{d-0.5}\right\}$$
Gaussian process correlation model. Choose between the isotropic
power exponential family ("exp"
) or the separable power exponential
family ("expsep"
, default); the current version also supports
the isotropic Matern ("matern"
) and single-index Model ("sim"
)
as “beta” functionality.
3-vector of Monte-carlo parameters (B)urn in, (T)otal, and (E)very. Predictive samples are saved every E MCMC rounds starting at round B, stopping at T.
Number of repeats or restarts of BTE
MCMC rounds,
default R=1
is no restarts
If TRUE
(default) the responses Z
will be
scaled to have a mean of zero and a range of 1
If TRUE
initializes MCMC with B
(additional)
rounds of Bayesian Linear CART (btlm
); default is FALSE
Importance tempering (IT) inverse temperature ladder,
or powers to improve mixing. See default.itemps
.
The default is no IT itemps = NULL
TRUE
(default) value results in prediction at
the inputs X
; FALSE
skips prediction at X
resulting in a faster
implementation
TRUE
(default) value results in collection of kriging
means and variances at predictive (and/or data) locations; FALSE
skips the gathering of kriging statistics giving a savings in
storage
If TRUE
then the predictive covariance matrix is
calculated-- can be computationally (and memory) intensive if
X
or XX
is large. Otherwise only the variances
(diagonal of covariance matrices) are calculated (default). See
outputs Zp.s2
, ZZ.s2
, etc., below
TRUE
results in ALC (Active Learning--Cohn)
computation of expected reduction in uncertainty calculations at the
XX
locations, which can be used for adaptive sampling;
FALSE
(default) skips this computation, resulting in
a faster implementation
TRUE
results in samples from the
improvement at locations XX
with respect to the observed
data minimum. These samples are used to calculate the expected
improvement over XX
, as well as to rank all of the points in
XX
in the order that they should be sampled to minimize the
expected multivariate improvement (refer to Schonlau et al, 1998).
Alternatively, improv
can be set to any positive integer 'g',
in which case the ranking is performed with respect to the expectation
for improvement raised to the power 'g'. Increasing 'g' leads to
rankings that are more oriented towards a global optimization.
The option FALSE
(default) skips these computations,
resulting in a faster implementation. Optionally, a two-vector
can be supplied where improv[2]
is interpreted as the
(maximum) number of points to rank by improvement. See the note below.
If not specified, the entire XX
matrix is ranked.
Either NULL
or a vector of parameters for
sensitivity analysis, built by the function sens
.
Refer there for details
“beta” functionality: fixed smoothness parameter for
the Matern correlation function; nu + 0.5
times differentiable
predictive surfaces result
TRUE
results in a saving of samples from the
posterior distribution for most of the parameters in the model. The
default is FALSE
for speed/storage reasons. See note below
Level of verbosity of R-console print statements: from 0 (none); 1 (default) which shows the “progress meter”; 2 includes an echo of initialization parameters; up to 3 and 4 (max) with more info about successful tree operations
These ellipses arguments are interpreted as augmentations to the prior specification generated by
params <- tgp.default.params(ncol(X)+1)
.
You may use these to specify a custom setting of any of default
parameters in the output list params
except those for which a specific argument is already provided
(e.g., params$corr
or params$bprior
) or those which contradict
the type of b*
function being called (e.g.,
params$tree
or params$gamma
); these redundant or
possibly conflicting specifications will be ignored. Refer to
tgp.default.params
for details on the prior specification
Robert B. Gramacy, rbg@vt.edu, and Matt Taddy, mataddy@amazon.com
The functions and their arguments can be categorized by whether or not they use treed partitioning (T), GP models, and jumps to the LLM (or LM)
blm | LM | Linear Model |
btlm | T, LM | Treed Linear Model |
bcart | T | Treed Constant Model |
bgp | GP | GP Regression |
bgpllm | GP, LLM | GP with jumps to the LLM |
btgp | T, GP | treed GP Regression |
btgpllm | T, GP, LLM | treed GP with jumps to the LLM |
Each function implements a special case of the generic function
tgp
which is an interface to C/C++ code for treed Gaussian process
modeling of varying parameterization. Documentation for tgp
has been declared redundant, and has subsequently been removed. To see
how the b*
functions use tgp
simply examine the
function. In the latest version, with the addition of the ellipses
“...” argument, there is nothing that can be done
with the direct tgp
function that cannot also be done with a
b*
function
Only functions in the T (tree) category take the tree
argument;
GP category functions take the corr
argument; and LLM category
functions take the gamma
argument. Non-tree class functions omit
the parts
output, see below
bcart
is the same as btlm
except that only the
intercept term in the LM is estimated; the others are zero, thereby
implementing a Bayesian version of the original CART model
The sens.p
argument contains a vector of parameters for
sensitivity analysis. It should be NULL
unless created by the
sens
function. Refer to help(sens)
for details.
If itemps =! NULL
then importance tempering (IT) is performed
to get better mixing. After each restart (when R > 1
) the
observation counts are used to update the pseudo-prior. Stochastic
approximation is performed in the first burn-in rounds (for B-T
rounds, not B
) when c0
and n0
are positive.
Every subsequent burn-in after the first restart is for B
rounds in order to settle-in after using the observation counts. See
default.itemps
for more details and an example
Please see vignette("tgp")
for a detailed illustration
Gramacy, R. B. (2020) Surrogates: Gaussian Process Modeling, Design and Optimization for the Applied Sciences. Boca Raton, Florida: Chapman Hall/CRC. (See Chapter 9.) https://bobby.gramacy.com/surrogates/
Gramacy, R. B. (2007). tgp: An R Package for Bayesian Nonstationary, Semiparametric Nonlinear Regression and Design by Treed Gaussian Process Models. Journal of Statistical Software, 19(9). https://www.jstatsoft.org/v19/i09 tools:::Rd_expr_doi("10.18637/jss.v019.i09")
Robert B. Gramacy, Matthew Taddy (2010). Categorical Inputs, Sensitivity Analysis, Optimization and Importance Tempering with tgp Version 2, an R Package for Treed Gaussian Process Models. Journal of Statistical Software, 33(6), 1--48. https://www.jstatsoft.org/v33/i06/ tools:::Rd_expr_doi("10.18637/jss.v033.i06")
Gramacy, R. B., Lee, H. K. H. (2007). Bayesian treed Gaussian process models with an application to computer modeling. Journal of the American Statistical Association, 103(483), pp. 1119-1130. Also available as ArXiv article 0710.4536 https://arxiv.org/abs/0710.4536
Gramacy, R. B. and Lee, K.H. (2008). Gaussian Processes and Limiting Linear Models. Computational Statistics and Data Analysis, 53, pp. 123-136. Also available as ArXiv article 0804.4685 https://arxiv.org/abs/0804.4685
Gramacy, R. B., Lee, H. K. H. (2009). Adaptive design and analysis of supercomputer experiments. Technometrics, 51(2), pp. 130-145. Also avaliable on ArXiv article 0805.4359 https://arxiv.org/abs/0805.4359
Robert B. Gramacy, Heng Lian (2011). Gaussian process single-index models as emulators for computer experiments. Available as ArXiv article 1009.4241 https://arxiv.org/abs/1009.4241
Chipman, H., George, E., & McCulloch, R. (1998). Bayesian CART model search (with discussion). Journal of the American Statistical Association, 93, 935--960.
Chipman, H., George, E., & McCulloch, R. (2002). Bayesian treed models. Machine Learning, 48, 303--324.
M. Schonlau and Jones, D.R. and Welch, W.J. (1998). Global versus local search in constrained optimization of computer models. In "New Developments and applications in experimental design", IMS Lecture Notes - Monograph Series 34. 11--25.
plot.tgp
, tgp.trees
,
predict.tgp
, sens
, default.itemps
# \donttest{
##
## Many of the examples below illustrate the above
## function(s) on random data. Thus it can be fun
## (and informative) to run them several times.
##
#
# simple linear response
#
# input and predictive data
X <- seq(0,1,length=50)
XX <- seq(0,1,length=99)
Z <- 1 + 2*X + rnorm(length(X),sd=0.25)
out <- blm(X=X, Z=Z, XX=XX) # try Linear Model
plot(out) # plot the surface
#
# 1-d Example
#
# construct some 1-d nonstationary data
X <- seq(0,20,length=100)
XX <- seq(0,20,length=99)
Z <- (sin(pi*X/5) + 0.2*cos(4*pi*X/5)) * (X <= 9.6)
lin <- X>9.6;
Z[lin] <- -1 + X[lin]/10
Z <- Z + rnorm(length(Z), sd=0.1)
out <- btlm(X=X, Z=Z, XX=XX) # try Linear CART
plot(out) # plot the surface
tgp.trees(out) # plot the MAP trees
out <- btgp(X=X, Z=Z, XX=XX) # use a treed GP
plot(out) # plot the surface
tgp.trees(out) # plot the MAP trees
#
# 2-d example
# (using the isotropic correlation function)
#
# construct some 2-d nonstationary data
exp2d.data <- exp2d.rand()
X <- exp2d.data$X; Z <- exp2d.data$Z
XX <- exp2d.data$XX
# try a GP
out <- bgp(X=X, Z=Z, XX=XX, corr="exp")
plot(out) # plot the surface
# try a treed GP LLM
out <- btgpllm(X=X, Z=Z, XX=XX, corr="exp")
plot(out) # plot the surface
tgp.trees(out) # plot the MAP trees
#
# Motorcycle Accident Data
#
# get the data
require(MASS)
# try a GP
out <- bgp(X=mcycle[,1], Z=mcycle[,2])
plot(out) # plot the surface
# try a treed GP LLM
# best to use the "b0" beta linear prior to capture common
# common linear process throughout all regions (using the
# ellipses "...")
out <- btgpllm(X=mcycle[,1], Z=mcycle[,2], bprior="b0")
plot(out) # plot the surface
tgp.trees(out) # plot the MAP trees
# }
Run the code above in your browser using DataLab