comp_pred: Fit and predict competing classification algorithms

Description

comp_pred provides a wrapper for running (i.e., fit or predict) alternative classification algorithms to data (i.e., data.train or data.test, respectively).

Usage

comp_pred(
  formula,
  data.train,
  data.test = NULL,
  algorithm = NULL,
  model = NULL,
  sens.w = NULL,
  new.factors = "exclude",
  quiet_mis = FALSE
)

Arguments

formula

A formula (usually x$formula, for an FFTrees object x).

data.train

A training dataset (as a data frame).

data.test

A testing dataset (as a data frame).

algorithm

A character string specifying an algorithm in the set:

"lr": Logistic regression (using glm from stats with family = "binomial");
"rlr": Regularized logistic regression (currently not supported);
"cart": Decision trees (using rpart from rpart);
"svm": Support vector machines (using svm from e1071);
"rf": Random forests (using randomForest from randomForest.

model

An optional existing model (as a model), to be applied to the test data.

sens.w

Sensitivity weight parameter (numeric, from 0 to 1), required to compute wacc.

new.factors

What should be done if new factor values are discovered in the test set (as a character string)? Available options:

"exclude": exclude case (i.e., remove these cases, used by default);
"base": predict the base rate of the criterion.

quiet_mis

A logical value passed to hide/show NA user feedback (usually x$params$quiet$mis of the calling function). Default: quiet_mis = FALSE (i.e., show user feedback).

Details

The range of competing algorithms currently available includes logistic regression (stats::glm), CART (rpart::rpart), support vector machines (e1071::svm), and random forests (randomForest::randomForest).

The current support for handling missing data (or NA values) is only rudimentary. When enabled (via the global options allow_NA_pred or allow_NA_crit), any rows in data.train or data.test with incomplete cases are being removed prior to fitting or predicting a model (by using na.omit from stats). See the specifications of each model for more sophisticated ways of handling missing data.