Supports the Extremely Randomized Trees package for SuperLearning, which is a variant of random forest.
SL.extraTrees(Y, X, newX, family, obsWeights, id, ntree = 500, mtry = if
(family$family == "gaussian") max(floor(ncol(X)/3), 1) else
floor(sqrt(ncol(X))), nodesize = if (family$family == "gaussian") 5 else 1,
numRandomCuts = 1, evenCuts = FALSE, numThreads = 1, quantile = FALSE,
subsetSizes = NULL, subsetGroups = NULL, tasks = NULL,
probOfTaskCuts = mtry/ncol(X), numRandomTaskCuts = 1, verbose = FALSE,
...)
Outcome variable
Covariate dataframe
Optional dataframe to predict the outcome
"gaussian" for regression, "binomial" for binary classification.
Optional observation-level weights (supported but not tested)
Optional id to group observations from the same unit (not used currently).
Number of trees (default 500).
Number of features tested at each node. Default is ncol(x) / 3 for regression and sqrt(ncol(x)) for classification.
The size of leaves of the tree. Default is 5 for regression and 1 for classification.
the number of random cuts for each (randomly chosen) feature (default 1, which corresponds to the official ExtraTrees method). The higher the number of cuts the higher the chance of a good cut.
if FALSE then cutting thresholds are uniformly sampled (default). If TRUE then the range is split into even intervals (the number of intervals is numRandomCuts) and a cut is uniformly sampled from each interval.
the number of CPU threads to use (default is 1).
if TRUE then quantile regression is performed (default is FALSE), only for regression data. Then use predict(et, newdata, quantile=k) to make predictions for k quantile.
subset size (one integer) or subset sizes (vector of integers, requires subsetGroups), if supplied every tree is built from a random subset of size subsetSizes. NULL means no subsetting, i.e. all samples are used.
list specifying subset group for each sample: from samples in group g, each tree will randomly select subsetSizes[g] samples.
vector of tasks, integers from 1 and up. NULL if no multi-task learning. (untested)
probability of performing task cut at a node (default mtry / ncol(x)). Used only if tasks is specified. (untested)
number of times task cut is performed at a node (default 1). Used only if tasks is specified. (untested)
Verbosity of model fitting.
Any remaining arguments (not supported though).
If Java runs out of memory: java.lang.OutOfMemoryError: Java heap space, then (assuming you have free memory) you can increase the heap size by: options( java.parameters = "-Xmx2g" ) before calling library(extraTrees),
Geurts, P., Ernst, D., & Wehenkel, L. (2006). Extremely randomized trees. Machine learning, 63(1), 3-42.
Simm, J., de Abril, I. M., & Sugiyama, M. (2014). Tree-based ensemble multi-task learning method for classification and regression. IEICE TRANSACTIONS on Information and Systems, 97(6), 1677-1681.
extraTrees
predict.SL.extraTrees
predict.extraTrees
data(Boston, package = "MASS")
Y = Boston$medv
# Remove outcome from covariate dataframe.
X = Boston[, -14]
set.seed(1)
# Sample rows to speed up example.
row_subset = sample(nrow(X), 30)
sl = SuperLearner(Y[row_subset], X[row_subset, ], family = gaussian(),
cvControl = list(V = 2), SL.library = c("SL.mean", "SL.extraTrees"))
print(sl)
Run the code above in your browser using DataLab