itree.object:
Recursive Partitioning and Regression Trees Object
Description
These are objects representing fitted itree
trees.
Value
- frame
-
data frame with one row for each node in the tree.
The
row.names
of frame
contain the (unique) node numbers that
follow a binary ordering indexed by node depth.
Elements of frame
include var
, a factor giving
the variable used in the split at each node
(leaf nodes are denoted by the string
), n
, the size of each node,
wt
, the sum of case weights for the node,
dev
, the deviance of each node,
yval
, the fitted value of the response at each node,
and splits
, a two column matrix of left and right split labels
for each node. All of these are the same as for an itree
object.For classification problems, information about total counts (or weights,
if weights are unequal) appear in the wt.classX
column where
the integer X
ranges from 1 to the number of classes. Similarly,
the wt.frac.classX
is the weight of class X in the node divided by
the total weight in the node. nodewt
is the total weight of
all observations in this node as fraction of the entire dataset. This naming
convention is different from rpart's.Also included in the frame are complexity
, the
complexity parameter at which this split will collapse, ncompete
,
the number of competitor splits retained, and nsurrogate
, the
number of surrogate splits retained. Note that complexity
values are
dependent on any penalty method and penalization constant used.
- where
-
integer vector, the same length as the number of observations in the root node,
containing the row number of
frame
corresponding to the leaf node
that each observation falls into.
- splits
-
a numeric matrix describing the splits. The row label is the name of the split
variable, and columns are
count
, the number of observations sent left
or right by the split (for competitor splits this is the number that
would have been sent left or right had this split been used, for surrogate
splits it is the number missing the primary split variable which were decided
using this surrogate), ncat
, the number of categories or levels for the
variable (+/-1
for a continuous variable), improve
, which is the improvement
in deviance given by this split, or, for surrogates, the concordance of the
surrogate with the primary, and split
, the numeric split point.
The last column adj
gives the adjusted concordance for surrogate splits.
For
a factor, the split
column contains the row number of the csplit matrix.
For a continuous variable, the sign of ncat
determines whether the
subset x < cutpoint
or x > cutpoint
is sent to the left.
- csplit
-
this will be present only if one of the split variables is a factor. There
is one row for each such split, and column
i = 1
if this level of the
factor goes to the left, 3
if it goes to the right, and 2 if that level
is not present at this node of the tree.
For an ordered categorical variable all levels are marked as R/L
,
including levels that are not present.
- method
-
the method used to grow the tree.
- penalty
-
the penalty function for splitting on a specific variable at a specific node given
the variables used in the branch leading to this node.
- cptable
-
the table of optimal prunings based on a complexity parameter.
NULL
for
extremes and purity methods.
- terms
-
an object of mode
expression
and class term
summarizing the formula.
Used by various methods, but typically not of direct relevance to users.
- call
-
an image of the call that produced the object, but with the arguments
all named and with the actual formula included as the formula argument.
To re-evaluate the call, say
update(tree)
.Optional components include the matrix of predictors (x
) and the
response variable (y
) used to construct the itree
object.
Structure
The following components must be included in a legitimate itree
object.
Of these, only the where
component has the same length as
the data used to fit the itree
object. The requirements here are the same
as those in rpart except itree objects have a penalty
parameter.