s.ADDTREE: Additive Tree: Tree-Structured Boosting [C]

Description

Train an Additive Tree model

Usage

s.ADDTREE(x, y = NULL, x.test = NULL, y.test = NULL, x.name = NULL,
  y.name = NULL, weights = NULL, update = c("exponential",
  "polynomial"), min.update = ifelse(update == "polynomial", 0.035,
  1000), min.hessian = 0.001, min.membership = 1,
  steps.past.min.membership = 0, gamma = 0.8, max.depth = 30,
  learning.rate = 0.1, ipw = TRUE, ipw.type = 2, upsample = FALSE,
  upsample.seed = NULL, imetrics = TRUE,
  grid.resample.rtset = rtset.resample("kfold", 5),
  metric = "Balanced Accuracy", maximize = TRUE, prune = TRUE,
  prune.empty.leaves = TRUE, remove.bad.parents = FALSE,
  match.rules = TRUE, print.plot = TRUE, plot.fitted = NULL,
  plot.predicted = NULL, plot.theme = getOption("rt.fit.theme",
  "lightgrid"), question = NULL, rtclass = NULL, verbose = TRUE,
  prune.verbose = FALSE, trace = 1, grid.verbose = TRUE,
  diagnostics = FALSE, outdir = NULL, save.rpart = FALSE,
  save.mod = ifelse(!is.null(outdir), TRUE, FALSE), n.cores = rtCores,
  ...)

Arguments

N x D matrix of N examples with D features

N x 1 vector of labels with values in -1,1

x.test

Numeric vector or matrix / data frame of testing set features Columns must correspond to columns in x

y.test

Numeric vector of testing set outcome

x.name

Character: Name for feature set

y.name

Character: Name for outcome

weights

Numeric vector: Weights for cases. For classification, weights takes precedence over ipw, therefore set weights = NULL if using ipw. Note: If weight are provided, ipw is not used. Leave NULL if setting ipw = TRUE. Default = NULL

min.hessian

[gS] Minimum second derivative to continue splitting

gamma

[gS] acceleration factor = lambda / (1 + lambda) #' @param max.depth [gS] maximum depth of the tree

learning.rate

[gS] learning rate for the Newton Raphson step that updates the function values of the node

ipw

Logical: If TRUE, apply inverse probability weighting (for Classification only). Note: If weights are provided, ipw is not used. Default = TRUE

ipw.type

Integer 0, 1, 2 1: class.weights as in 0, divided by max(class.weights) 2: class.weights as in 0, divided by min(class.weights) Default = 2

upsample

Logical: If TRUE, upsample cases to balance outcome classes (for Classification only) Caution: upsample will randomly sample with replacement if the length of the majority class is more than double the length of the class you are upsampling, thereby introducing randomness

upsample.seed

Integer: If provided, will be used to set the seed during upsampling. Default = NULL (random seed)

match.rules

Logical: If TRUE, match cases to rules to get statistics per node, i.e. what percent of cases match each rule. If available, these are used by mplot3.addtree when plotting

print.plot

Logical: if TRUE, produce plot using mplot3 Takes precedence over plot.fitted and plot.predicted

plot.fitted

Logical: if TRUE, plot True (y) vs Fitted

plot.predicted

Logical: if TRUE, plot True (y.test) vs Predicted. Requires x.test and y.test

plot.theme

String: "zero", "dark", "box", "darkbox"

question

String: the question you are attempting to answer with this model, in plain language.

rtclass

String: Class type to use. "S3", "S4", "RC", "R6"

verbose

Logical: If TRUE, print summary to screen.

trace

Integer: If higher than 0, will print more information to the console. Default = 0

outdir

Path to output directory. If defined, will save Predicted vs. True plot, if available, as well as full model output, if save.mod is TRUE

save.mod

Logical. If TRUE, save all output as RDS file in outdir save.mod is TRUE by default if an outdir is defined. If set to TRUE, and no outdir is defined, outdir defaults to paste0("./s.", mod.name)

...

Additional arguments

catPredictors

Logical vector with the same length as the feature vector, where TRUE means that the corresponding column of x is a categorical variable

Value

Object of class rtMod

Details

For binary classification, outcome must be factor with two levels, the first level is the 'positive' class

Factor levels should not contain the "/" character (it is used to separate conditions in the addtree object)

[gS] Indicates that more than one value can be supplied, which will result in grid search using internal resampling lambda <- gamma/(1 - gamma)

References

Valdes G, Luna JM, Eaton E, Simone CB, Ungar LH, Solberg TD. MediBoost: a Patient Stratification Tool for Interpretable Decision Making in the Era of Precision Medicine. Sci Rep. 2016;6:37854. doi:10.1038/srep37854.