bag: A General Framework For Bagging

Description

bag provides a framework for bagging classification or regression models. The user can provide their own functions for model building, prediction and aggregation of predictions (see Details below).

Usage

bag(x, ...)
bagControl(
  fit = NULL,
  predict = NULL,
  aggregate = NULL,
  downSample = FALSE,
  oob = TRUE,
  allowParallel = TRUE
)
# S3 method for default
bag(x, y, B = 10, vars = ncol(x), bagControl = NULL, ...)
# S3 method for bag
predict(object, newdata = NULL, ...)
# S3 method for bag
print(x, ...)
# S3 method for bag
summary(object, ...)
# S3 method for summary.bag
print(x, digits = max(3, getOption("digits") - 3), ...)
ldaBag
plsBag
nbBag
ctreeBag
svmBag
nnetBag

Value

bag produces an object of class bag with elements

fits: a list with two sub-objects: the fit object has the actual model fit for that #' bagged samples and the vars object is either NULL or a vector of integers corresponding to which predictors were sampled for that model
control: a mirror of the arguments passed into bagControl
call: the call
B: the number of bagging iterations
dims: the dimensions of the training set

Format

An object of class list of length 3.

Arguments

x: a matrix or data frame of predictors
...: arguments to pass to the model function
fit: a function that has arguments x, y and ... and produces a model object #' that can later be used for prediction. Example functions are found in ldaBag, plsBag, #' nbBag, svmBag and nnetBag.
predict: a function that generates predictions for each sub-model. The function should have #' arguments object and x. The output of the function can be any type of object (see the #' example below where posterior probabilities are generated. Example functions are found in ldaBag#' , plsBag, nbBag, svmBag and nnetBag.)
aggregate: a function with arguments x and type. The function that takes the output #' of the predict function and reduces the bagged predictions to a single prediction per sample. #' the type argument can be used to switch between predicting classes or class probabilities for #' classification models. Example functions are found in ldaBag, plsBag, nbBag, #' svmBag and nnetBag.
downSample: logical: for classification, should the data set be randomly sampled so that each #' class has the same number of samples as the smallest class?
oob: logical: should out-of-bag statistics be computed and the predictions retained?
allowParallel: a parallel backend is loaded and available, should the function use it?
y: a vector of outcomes
B: the number of bootstrap samples to train over.
vars: an integer. If this argument is not NULL, a random sample of size vars is taken of the predictors in each bagging iteration. If NULL, all predictors are used.
bagControl: a list of options.
object: an object of class bag.
newdata: a matrix or data frame of samples for prediction. Note that this argument must have a non-null value
digits: minimal number of significant digits.

Author

Max Kuhn

Details

The function is basically a framework where users can plug in any model in to assess the effect of bagging. Examples functions can be found in ldaBag, plsBag , nbBag, svmBag and nnetBag. Each has elements fit, pred and aggregate.

One note: when vars is not NULL, the sub-setting occurs prior to the fit and #' predict functions are called. In this way, the user probably does not need to account for the #' change in predictors in their functions.

When using bag with train, classification models should use type = "prob" #' inside of the predict function so that predict.train(object, newdata, type = "prob") will #' work.

If a parallel backend is registered, the foreach package is used to train the models in parallel.

Examples

Run this code

## A simple example of bagging conditional inference regression trees:
data(BloodBrain)

## treebag <- bag(bbbDescr, logBBB, B = 10,
##                bagControl = bagControl(fit = ctreeBag$fit,
##                                        predict = ctreeBag$pred,
##                                        aggregate = ctreeBag$aggregate))




## An example of pooling posterior probabilities to generate class predictions
data(mdrr)

## remove some zero variance predictors and linear dependencies
mdrrDescr <- mdrrDescr[, -nearZeroVar(mdrrDescr)]
mdrrDescr <- mdrrDescr[, -findCorrelation(cor(mdrrDescr), .95)]

## basicLDA <- train(mdrrDescr, mdrrClass, "lda")

## bagLDA2 <- train(mdrrDescr, mdrrClass,
##                  "bag",
##                  B = 10,
##                  bagControl = bagControl(fit = ldaBag$fit,
##                                          predict = ldaBag$pred,
##                                          aggregate = ldaBag$aggregate),
##                  tuneGrid = data.frame(vars = c((1:10)*10 , ncol(mdrrDescr))))

Run the code above in your browser using DataLab