- train_data
data.frame
.
Can contain a grouping factor for identifying partitions - as made with
groupdata2::partition()
.
See `partitions_col`
.
- formulas
Model formulas as strings. (Character)
E.g. c("y~x", "y~z")
.
Can contain random effects.
E.g. c("y~x+(1|r)", "y~z+(1|r)")
.
- family
Name of the family. (Character)
Currently supports "gaussian"
for linear regression
with lm()
/ lme4::lmer()
and "binomial"
for binary classification
with glm()
/ lme4::glmer()
.
See cross_validate_fn()
for use with other model functions.
- test_data
data.frame
. If specifying `partitions_col`
, this can be NULL
.
- partitions_col
Name of grouping factor for identifying partitions. (Character)
Rows with the value 1
in `partitions_col`
are used as training set and
rows with the value 2
are used as test set.
N.B. Only used if `test_data`
is NULL
.
- control
Construct control structures for mixed model fitting
(with lme4::lmer()
or lme4::glmer()
).
See lme4::lmerControl
and
lme4::glmerControl
.
N.B. Ignored if fitting lm()
or glm()
models.
- REML
Restricted Maximum Likelihood. (Logical)
- cutoff
Threshold for predicted classes. (Numeric)
N.B. Binomial models only
- positive
Level from dependent variable to predict.
Either as character (preferable) or level index (1
or 2
- alphabetically).
E.g. if we have the levels "cat"
and "dog"
and we want "dog"
to be the positive class,
we can either provide "dog"
or 2
, as alphabetically, "dog"
comes after "cat"
.
Note: For reproducibility, it's preferable to specify the name directly, as
different locales
may sort the levels differently.
Used when calculating confusion matrix metrics and creating ROC
curves.
The Process
column in the output can be used to verify this setting.
N.B. Only affects evaluation metrics, not the model training or returned predictions.
N.B. Binomial models only.
- metrics
list
for enabling/disabling metrics.
E.g. list("RMSE" = FALSE)
would remove RMSE
from the results,
and list("Accuracy" = TRUE)
would add the regular Accuracy
metric
to the classification results.
Default values (TRUE
/FALSE
) will be used for the remaining available metrics.
You can enable/disable all metrics at once by including
"all" = TRUE/FALSE
in the list
. This is done prior to enabling/disabling
individual metrics, why list("all" = FALSE, "RMSE" = TRUE)
would return only the RMSE
metric.
The list
can be created with
gaussian_metrics()
or
binomial_metrics()
.
Also accepts the string "all"
.
- preprocessing
Name of preprocessing to apply.
Available preprocessings are:
Name | Description | "standardize" |
Centers and scales the numeric predictors. | "range" | Normalizes the numeric predictors to the 0 -1 range.
Values outside the min/max range in the test fold are truncated to 0 /1 . |
"scale" | Scales the numeric predictors to have a standard deviation of one. | "center" |
The preprocessing parameters (mean
, SD
, etc.) are extracted from the training folds and
applied to both the training folds and the test fold.
They are returned in the Preprocess column for inspection.
N.B. The preprocessings should not affect the results
to a noticeable degree, although "range"
might due to the truncation.
- err_nc
Whether to raise an error
if a model does not converge. (Logical)
- rm_nc
Remove non-converged models from output. (Logical)
- parallel
Whether to validate the list of models in parallel. (Logical)
Remember to register a parallel backend first.
E.g. with doParallel::registerDoParallel
.
- verbose
Whether to message process information
like the number of model instances to fit and which model function was applied. (Logical)
- link, models, model_verbose
Deprecated.