Train a CART for regression or classification using rpart
s.CART(x, y = NULL, x.test = NULL, y.test = NULL, x.name = NULL,
y.name = NULL, weights = NULL, ipw = TRUE, ipw.type = 2,
upsample = FALSE, upsample.seed = NULL, method = "auto",
parms = NULL, minsplit = 2, minbucket = round(minsplit/3),
cp = 0.01, maxdepth = 20, maxcompete = 0, maxsurrogate = 0,
usesurrogate = 2, surrogatestyle = 0, xval = 0, cost = NULL,
model = TRUE, prune.cp = NULL, use.prune.rpart.rt = TRUE,
return.unpruned = FALSE,
grid.resample.rtset = rtset.resample("kfold", 5),
grid.search.type = c("exhaustive", "randomized"),
grid.randomized.p = 0.1, metric = NULL, maximize = NULL,
na.action = na.exclude, n.cores = rtCores, print.plot = TRUE,
plot.fitted = NULL, plot.predicted = NULL,
plot.theme = getOption("rt.fit.theme", "lightgrid"), question = NULL,
verbose = TRUE, grid.verbose = TRUE, outdir = NULL,
save.mod = ifelse(!is.null(outdir), TRUE, FALSE), rtModLog = NULL)
Numeric vector or matrix / data frame of features i.e. independent variables
Numeric vector of outcome, i.e. dependent variable
Numeric vector or matrix / data frame of testing set features
Columns must correspond to columns in x
Numeric vector of testing set outcome
Character: Name for feature set
Character: Name for outcome
Numeric vector: Weights for cases. For classification, weights
takes precedence
over ipw
, therefore set weights = NULL
if using ipw
.
Note: If weight
are provided, ipw
is not used. Leave NULL if setting ipw = TRUE
. Default = NULL
Logical: If TRUE, apply inverse probability weighting (for Classification only).
Note: If weights
are provided, ipw
is not used. Default = TRUE
Integer 0, 1, 2 1: class.weights as in 0, divided by max(class.weights) 2: class.weights as in 0, divided by min(class.weights) Default = 2
Logical: If TRUE, upsample cases to balance outcome classes (for Classification only) Caution: upsample will randomly sample with replacement if the length of the majority class is more than double the length of the class you are upsampling, thereby introducing randomness
Integer: If provided, will be used to set the seed during upsampling. Default = NULL (random seed)
String: "auto", "anova", "poisson", "class" or "exp". Default = "auto"
List of additional parameters for the splitting function.
See rpart::rpart("parms")
[gS] Integer: Minimum number of cases that must belong in a node before considering a split. Default = 2
[gS] Integer: Minimum number of cases allowed in a child node. Default = round(minsplit/3)
[gS] Float: Complexity threshold for allowing a split. Default = .01
[gS] Integer: Maximum depth of tree. Default = 20
Vector, Float (> 0): One for each variable in the model.
See rpart::rpart("cost")
Logical: If TRUE, keep a copy of the model. Default = TRUE
[gS] Float: Complexity for cost-complexity pruning after tree is built
[Testing only, do not change]
Logical: If TRUE and prune.cp
is set, return unpruned tree under extra
in rtMod
List: Output of rtset.resample defining gridSearchLearn parameters.
Default = rtset.resample("kfold", 5)
String: Type of grid search to perform: "exhaustive" or "randomized". Default = "exhaustive"
Float (0, 1): If grid.search.type = "randomized"
, randomly run this proportion of
combinations. Default = .1
String: Metric to minimize, or maximize if maximize = TRUE
during grid search.
Default = NULL, which results in "Balanced Accuracy" for Classification,
"MSE" for Regression, and "Coherence" for Survival Analysis.
Logical: If TRUE, metric
will be maximized if grid search is run. Default = FALSE
How to handle missing values. See ?na.fail
Integer: Number of cores to use. Defaults to available cores reported by
future::availableCores()
, unles option rt.cores
is set at the time the library is loaded
Logical: if TRUE, produce plot using mplot3
Takes precedence over plot.fitted
and plot.predicted
Logical: if TRUE, plot True (y) vs Fitted
Logical: if TRUE, plot True (y.test) vs Predicted.
Requires x.test
and y.test
String: "zero", "dark", "box", "darkbox"
String: the question you are attempting to answer with this model, in plain language.
Logical: If TRUE, print summary to screen.
Logical: Passed to gridSearchLearn
Path to output directory.
If defined, will save Predicted vs. True plot, if available,
as well as full model output, if save.mod
is TRUE
Logical. If TRUE, save all output as RDS file in outdir
save.mod
is TRUE by default if an outdir
is defined. If set to TRUE, and no outdir
is defined, outdir defaults to paste0("./s.", mod.name)
Object of class rtMod
[gS] indicates grid search will be performed automatically if more than one value is passed
elevate for external cross-validation
Other Supervised Learning: s.ADABOOST
,
s.ADDTREE
, s.BART
,
s.BAYESGLM
, s.BRUTO
,
s.C50
, s.CTREE
,
s.DA
, s.ET
,
s.EVTREE
, s.GAM.default
,
s.GAM.formula
, s.GAMSEL
,
s.GAM
, s.GBM3
,
s.GBM
, s.GLMNET
,
s.GLM
, s.GLS
,
s.H2ODL
, s.H2OGBM
,
s.H2ORF
, s.IRF
,
s.KNN
, s.LDA
,
s.LM
, s.MARS
,
s.MLRF
, s.MXN
,
s.NBAYES
, s.NLA
,
s.NLS
, s.NW
,
s.POLYMARS
, s.PPR
,
s.PPTREE
, s.QDA
,
s.QRNN
, s.RANGER
,
s.RFSRC
, s.RF
,
s.SGD
, s.SPLS
,
s.SVM
, s.TFN
,
s.XGBLIN
, s.XGB
Other Tree-based methods: s.ADABOOST
,
s.ADDTREE
, s.BART
,
s.C50
, s.CTREE
,
s.ET
, s.EVTREE
,
s.GBM3
, s.GBM
,
s.H2OGBM
, s.H2ORF
,
s.IRF
, s.MLRF
,
s.PPTREE
, s.RANGER
,
s.RFSRC
, s.RF
,
s.XGB
Other Interpretable models: s.ADDTREE
,
s.C50
, s.GLMNET
,
s.GLM