Compare error rates, between different functions and different selection rules, for an approximately equal random division of the data into a training and test set.
compareTreecalcs(x = yesno ~ ., data = DAAG::spam7, cp = 0.00025, fun = c("rpart",
"randomForest"))
If rpart
is specified in fun
, the following:
the estimated cross-validation error rate
when rpart()
is run on the training data (I), and the
one-standard error rule is used
the estimated cross-validation error rate when
rpart()
is run on subset I, and the model used that
gives the minimum cross-validated error rate
the error rate when the model that leads to rpSEcvI
is used to make predictions for subset II
the error rate when the model that leads to rpcvI
is used to make predictions for subset II
number of splits required by the one standard error rule
number of splits to give the minimum error
If rpart
is specified in fun
, the following:
the out-of-bag (OOB) error rate when
randomForest()
is run on subset I
the error rate when the model that leads to rfcvI
is used to make predictions for subset II
model formula
an data frame in which to interpret the variables named in the formula
setting for the cost complexity parameter cp
,
used by rpart()
one or both of "rpart" and "randomForest"
John Maindonald
Data are randomly divided into two subsets, I and II. The function(s) are used in the standard way for calculations on subset I, and error rates returined that come from the calculations carried out by the function(s). Predictions are made for subset II, allowing the calculation of a completely independent set of error rates.