comp_pred
provides a wrapper for running (i.e., fit or predict)
alternative classification algorithms to data
(i.e., data.train
or data.test
, respectively).
comp_pred(
formula,
data.train,
data.test = NULL,
algorithm = NULL,
model = NULL,
sens.w = NULL,
new.factors = "exclude",
quiet_mis = FALSE
)
A formula (usually x$formula
, for an FFTrees
object x
).
A training dataset (as a data frame).
A testing dataset (as a data frame).
A character string specifying an algorithm in the set:
"lr"
: Logistic regression (using glm
from stats with family = "binomial"
);
"rlr"
: Regularized logistic regression (currently not supported);
"cart"
: Decision trees (using rpart
from rpart);
"svm"
: Support vector machines (using svm
from e1071);
"rf"
: Random forests (using randomForest
from randomForest.
An optional existing model (as a model
), to be applied to the test data.
Sensitivity weight parameter (numeric, from 0
to 1
), required to compute wacc
.
What should be done if new factor values are discovered in the test set (as a character string)? Available options:
"exclude"
: exclude case (i.e., remove these cases, used by default);
"base"
: predict the base rate of the criterion.
A logical value passed to hide/show NA
user feedback
(usually x$params$quiet$mis
of the calling function).
Default: quiet_mis = FALSE
(i.e., show user feedback).
The range of competing algorithms currently available includes
logistic regression (stats::glm
),
CART (rpart::rpart
),
support vector machines (e1071::svm
), and
random forests (randomForest::randomForest
).
The current support for handling missing data (or NA
values) is only rudimentary.
When enabled (via the global options allow_NA_pred
or allow_NA_crit
),
any rows in data.train
or data.test
with incomplete cases are being removed
prior to fitting or predicting a model (by using na.omit
from stats).
See the specifications of each model for more sophisticated ways of handling missing data.