gbm
(Ridgeway 2013; Friedman, 2001) to fit a univariate tree model for each outcome, selecting predictors at each iteration that explain (co)variance in the outcomes. The number of trees included in the model can be chosen by minimizing the multivariate mean squared error using cross validation or a test set.
mvtb(Y, X, n.trees = 100, shrinkage = 0.01, interaction.depth = 1, distribution="gaussian", train.fraction = 1, bag.fraction = 1, cv.folds = 1, s = NULL, seednum = NULL, compress = FALSE, save.cv = FALSE, iter.details = TRUE, verbose=FALSE, mc.cores = 1, ...)
mvtb.fit(Y,X, n.trees=100, shrinkage=.01, interaction.depth=1, bag.fraction=1, s=1:nrow(X), seednum=NULL,...)
cv.folds
and train.fraction
are specified, the CV is carried out within the training set.object$best.trees
s
is given, train.fraction
is ignored.set.seed
TRUE/FALSE
. Compress output results list using bzip2 (approx 10% of original size). Default is FALSE
.TRUE/FALSE
. Save all k-fold cross-validation models. Default is FALSE
.TRUE/FALSE
. Return training, test, and cross-validation error at each iteration. Default is FALSE
.TRUE
, will print out progress and performance indicators for each model. Default is FALSE
.gbm
. These include distribution
, weights
, var.monotone
, n.minobsinnode
, keep.data
, verbose
, class.stratify.cv
. Note that other distribution
arguments have not been tested.models
- list of gbm models for each outcome. Functions from the gbm package (e.g. to compute relative influence, print trees, obtain predictions, etc) can be directly applied to each of these models
best.trees
- A list containing the number of trees that minimize the multivariate MSE in a test set or by CV, and n.trees
.
Many of the functions in the package default to using the minimum value of the three.
params
- arguments to mvtb
trainerr
- multivariate training error at each tree (If iter.details = TRUE
)
testerr
- multivariate test error at each tree (if train.fraction < 1
and iter.details = TRUE
)
cverr
- multivariate cv error at each tree (if cv.folds > 1
and iter.details = TRUE
)
ocv
- the CV models if save.cv=TRUE
s
- indices of training sample
n
- number of observations
xnames
ynames
mvtb.fit
:
$models
). (Relative) influences can be retrieved using summary
or mvtb.ri
, which are the usual reductions in SSE due to splitting on each predictor.
The covariance explained in pairs of outcomes by each predictor can be computed using mvtb.covex
.
Partial dependence plots can be obtained from mvtb.plot
.
The model is tuned jointly by selecting the number of trees that minimize multivariate mean squared error in a test set (by setting train.fraction
) or averaged over k folds in k-fold cross-validation (by setting cv.folds > 1
).
The best number of trees is available via $best.trees
.
If both cv.folds
and train.fraction
is specified, cross-validation is carried out within the training set.
If s
is specified, train.fraction
is ignored but cross-validation will be carried out for observations in s
.
Cross-validation models are usually discarded but can be saved by setting save.cv = TRUE
. CV models can be accessed from $ocv
of the
output object. Observations can be specifically set for inclusion in the training set by passing a vector of integers indexing the rows to include to s
.
Multivariate mean squared training, test, and cv error are available from $trainerr, $testerr, $cverr
from the output object
when iter.details = TRUE
.
Since the output objects can be large, automatic compression is available by setting compress=TRUE
.
All methods that use the mvtb
object automatically uncompress this object if necessary.
The function mvtb.uncomp
is available to manually decompress the object.
Note that trees are grown until a minimum number of observations in each node is reached.
If the number of training samples
*bag.fraction
is less the minimum number of observations, (which can occur with small data sets), this will cause an error.
Adjust the n.minobsinnode
, train.fraction
, or bag.fraction
.
Cross-validation can be parallelized by setting mc.cores > 1. Parallel cross-validation is carried out using parallel::mclapply
, which makes mc.cores
copies of the original environment.
For models with many trees (> 100K), memory limits can be reached rapidly. mc.cores
will not work on Windows.
Ridgeway, G., Southworth, M. H., & RUnit, S. (2013). Package 'gbm'. Viitattu, 10, 2013.
Elith, J., Leathwick, J. R., & Hastie, T. (2008). A working guide to boosted regression trees. Journal of Animal Ecology, 77(4), 802-813. Friedman, J. H. (2001). Greedy function approximation: a gradient boosting machine. Annals of statistics, 1189-1232.
summary.mvtb
, predict.mvtb
mvtb.covex
to estimate the covariance explained in pairs of outcomes by predictors
mvtb.nonlin
to help detect nonlinear effects or interactions
plot.mvtb
, mvtb.perspec
for partial dependence plots
mvtb.uncomp
to uncompress a compressed output object
data(wellbeing)
Y <- wellbeing[,21:26]
X <- wellbeing[,1:20]
Ys <- scale(Y)
cont.id <- unlist(lapply(X,is.numeric))
Xs <- scale(X[,cont.id])
## Fit the model
res <- mvtb(Y=Ys,X=Xs)
## Interpret the model
summary(res)
covex <- mvtb.covex(res, Y=Ys, X=Xs)
plot(res,predictor.no = 8)
predict(res,newdata=Xs)
mvtb.cluster(covex)
mvtb.heat(t(mvtb.ri(res)),cexRow=.8,cexCol=1,dec=0)
Run the code above in your browser using DataLab