Learn R Programming

SuperLearner (version 2.0-24)

SL.ranger: SL wrapper for ranger

Description

Ranger is a fast implementation of Random Forest (Breiman 2001) or recursive partitioning, particularly suited for high dimensional data.

Extending code by Eric Polley from the SuperLearnerExtra package.

Usage

SL.ranger(Y, X, newX, family, obsWeights, num.trees = 500,
  mtry = floor(sqrt(ncol(X))), write.forest = TRUE,
  probability = family$family == "binomial",
  min.node.size = ifelse(family$family == "gaussian", 5, 1), replace = TRUE,
  sample.fraction = ifelse(replace, 1, 0.632), num.threads = 1,
  verbose = T, ...)

Arguments

Y

Outcome variable

X

Training dataframe

newX

Test dataframe

family

Gaussian or binomial

obsWeights

Observation-level weights

num.trees

Number of trees.

mtry

Number of variables to possibly split at in each node. Default is the (rounded down) square root of the number variables.

write.forest

Save ranger.forest object, required for prediction. Set to FALSE to reduce memory usage if no prediction intended.

probability

Grow a probability forest as in Malley et al. (2012).

min.node.size

Minimal node size. Default 1 for classification, 5 for regression, 3 for survival, and 10 for probability.

replace

Sample with replacement.

sample.fraction

Fraction of observations to sample. Default is 1 for sampling with replacement and 0.632 for sampling without replacement.

num.threads

Number of threads to use.

verbose

If TRUE, display additional output during execution.

...

Any additional arguments, not currently used.

References

Breiman, L. (2001). Random forests. Machine learning 45:5-32.

Wright, M. N. & Ziegler, A. (2016). ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R. Journal of Statistical Software, in press. http://arxiv.org/abs/1508.04409.

See Also

SL.ranger ranger predict.ranger

Examples

Run this code
# NOT RUN {
data(Boston, package = "MASS")
Y = Boston$medv
# Remove outcome from covariate dataframe.
X = Boston[, -14]

set.seed(1)

# Use only 2 CV folds to speed up example.
sl = SuperLearner(Y, X, family = gaussian(), cvControl = list(V = 2),
                 SL.library = c("SL.mean", "SL.ranger"))
sl

pred = predict(sl, X)
summary(pred$pred)

# }

Run the code above in your browser using DataLab