compareTreecalcs: Error rate comparisons for tree-based classification

Description

Compare error rates, between different functions and different selection rules, for an approximately equal random division of the data into a training and test set.

Usage

compareTreecalcs(x = yesno ~ ., data = DAAG::spam7, cp = 0.00025, fun = c("rpart",
"randomForest"))

Arguments

model formula

data

an data frame in which to interpret the variables named in the formula

setting for the cost complexity parameter cp, used by rpart()

fun

one or both of "rpart" and "randomForest"

Value

If rpart is specified in fun, the following:

rpSEcvI

the estimated cross-validation error rate when rpart() is run on the training data (I), and the one-standard error rule is used

rpcvI

the estimated cross-validation error rate when rpart() is run on subset I, and the model used that gives the minimum cross-validated error rate

rpSEtest

the error rate when the model that leads to rpSEcvI is used to make predictions for subset II

rptest

the error rate when the model that leads to rpcvI is used to make predictions for subset II

nSErule

number of splits required by the one standard error rule

nREmin

number of splits to give the minimum error

If rpart is specified in fun, the following:

rfcvI

the out-of-bag (OOB) error rate when randomForest() is run on subset I

rftest

the error rate when the model that leads to rfcvI is used to make predictions for subset II

Details

Data are randomly divided into two subsets, I and II. The function(s) are used in the standard way for calculations on subset I, and error rates returined that come from the calculations carried out by the function(s). Predictions are made for subset II, allowing the calculation of a completely independent set of error rates.