gbm.holdout: gbm holdout

Description

Calculates a gradient boosting (gbm) object in which model complexity is determined using a training set with predictions made to a withheld set. An initial set of trees is fitted, and then trees are progressively added testing performance along the way, using gbm.perf until the optimal number of trees is identified.

As any structured ordering of the data should be avoided, a copy of the data set is BY DEFAULT randomly reordered each time the function is run.

Usage

gbm.holdout(data, gbm.x, gbm.y, learning.rate = 0.001, tree.complexity = 1, 
 family = "bernoulli", n.trees = 200, add.trees = n.trees, max.trees = 20000, 
 verbose = TRUE, train.fraction = 0.8, permute = TRUE, prev.stratify = TRUE,
 var.monotone = rep(0, length(gbm.x)), site.weights = rep(1, nrow(data)), 
 refit = TRUE, keep.data = TRUE)

Value

A gbm object

Arguments

data: data.frame
gbm.x: indices of the predictors in the input dataframe
gbm.y: index of the response in the input dataframe
learning.rate: typically varied between 0.1 and 0.001
tree.complexity: sometimes called interaction depth
family: "bernoulli","poisson", etc. as for gbm
n.trees: initial number of trees
add.trees: number of trees to add at each increment
max.trees: maximum number of trees to fit
verbose: controls degree of screen reporting
train.fraction: proportion of data to use for training
permute: reorder data to start with
prev.stratify: stratify selection for presence/absence data
var.monotone: allows constraining of response to monotone
site.weights: set equal to 1 by default
refit: refit the model with the full data but id'd no of trees
keep.data: keep copy of the data

Author

John R. Leathwick and Jane Elith

References

Elith, J., J.R. Leathwick and T. Hastie, 2009. A working guide to boosted regression trees. Journal of Animal Ecology 77: 802-81