The recursive partitioning function, for R
honest.rparttree(
formula,
data,
weights,
subset,
est_data,
est_weights,
na.action = na.rpart,
method,
model = FALSE,
x = FALSE,
y = TRUE,
parms,
control,
cost,
...
)
An object of class rpart
after running an honest recursive
partitioning tree.
.
a formula, with a response and features but
no interaction terms. If this a a data frome, that is taken as
the model frame (see model.frame).
an optional data frame that includes the variables named in the formula.
optional case weights.
optional expression saying that only a subset of the rows of the data should be used in the fit.
data frame to be used for leaf estimates; the estimation sample. Must contain the variables used in training the tree.
optional case weights for estimation sample
the default action deletes all observations for which
y
is missing, but keeps those in which one or more predictors
are missing.
one of "anova"
, "poisson"
, "class"
or "exp"
. If method
is missing then the routine tries
to make an intelligent guess.
If y
is a survival object, then method = "exp"
is assumed,
if y
has 2 columns then method = "poisson"
is assumed,
if y
is a factor then method = "class"
is assumed,
otherwise method = "anova"
is assumed.
It is wisest to specify the method directly, especially as more
criteria may added to the function in future.
Alternatively, method
can be a list of functions named
init
, split
and eval
. Examples are given in
the file tests/usersplits.R
in the sources, and in the
vignettes ‘User Written Split Functions’.
model frame of causalTree
, same as rpart
keep a copy of the x
matrix in the result.
keep a copy of the dependent variable in the result. If
missing and model
is supplied this defaults to FALSE
.
optional parameters for the splitting function.
Anova splitting has no parameters.
Poisson splitting has a single parameter, the coefficient of variation of
the prior distribution on the rates. The default value is 1.
Exponential splitting has the same parameter as Poisson.
For classification splitting, the list can contain any of:
the vector of prior probabilities (component prior
), the loss matrix
(component loss
) or the splitting index (component
split
). The priors must be positive and sum to 1. The loss
matrix must have zeros on the diagonal and positive off-diagonal
elements. The splitting index can be gini
or
information
. The default priors are proportional to the data
counts, the losses default to 1, and the split defaults to
gini
.
a list of options that control details of the
rpart
algorithm. See rpart.control
.
a vector of non-negative costs, one for each variable in the model. Defaults to one for all variables. These are scalings to be applied when considering splits, so the improvement on splitting on a variable is divided by its cost in deciding which split to choose.
arguments to rpart.control
may also be
specified in the call to causalTree
. They are checked against the
list of valid arguments. An example of a commonly set parameter would
be xval
, which sets the number of cross-validation samples.
The parameter minsize
is implemented differently in
causalTree
than in rpart
; we require a minimum of minsize
treated observations and a minimum of minsize
control
observations in each leaf.