compareTreecalcs: Error rate comparisons for tree-based classification

Description

Compare error rates, between different functions and different selection rules, for an approximately equal random division of the data into a training and test set.

Usage

compareTreecalcs(x = yesno ~ ., data = DAAG::spam7, cp = 0.00025, fun = c("rpart",
"randomForest"))

Value

If rpart is specified in fun, the following:

rpSEcvI: the estimated cross-validation error rate when rpart() is run on the training data (I), and the one-standard error rule is used
rpcvI: the estimated cross-validation error rate when rpart() is run on subset I, and the model used that gives the minimum cross-validated error rate
rpSEtest: the error rate when the model that leads to rpSEcvI is used to make predictions for subset II
rptest: the error rate when the model that leads to rpcvI is used to make predictions for subset II
nSErule: number of splits required by the one standard error rule
nREmin: number of splits to give the minimum error

If rpart is specified in fun, the following:

rfcvI: the out-of-bag (OOB) error rate when randomForest() is run on subset I
rftest: the error rate when the model that leads to rfcvI is used to make predictions for subset II

Arguments

x: model formula
data: an data frame in which to interpret the variables named in the formula
cp: setting for the cost complexity parameter cp, used by rpart()
fun: one or both of "rpart" and "randomForest"

Author

John Maindonald

Details

Data are randomly divided into two subsets, I and II. The function(s) are used in the standard way for calculations on subset I, and error rates returined that come from the calculations carried out by the function(s). Predictions are made for subset II, allowing the calculation of a completely independent set of error rates.