Testing, cross validation and visualization of the accuracy of different sex prediction models using the confusionMatrix and roc curves.
accu_model(
f,
x,
y = NULL,
method = "lda",
res_method = "repeatedcv",
p = 0.75,
nf = 10,
nr = 3,
plot = FALSE,
Sex = 1,
Pop = NULL,
byPop = FALSE,
ref. = "F",
post. = "M",
...
)
Visual and numerical accuracy parameters for the tested model
Formula in the form `groups ~ x1 + x2 + ...`. The grouping factor is placed to the left hand side while the numerical measurements are placed to the right hand side
Data frame to be fitted to the model
New data frame to be tested, if `NULL` `x` is split to test and training datasets, Default: NULL
A string specifying which classification or regression model to use. For list of supported methods see models.
The resampling method: "boot", "boot632", "optimism_boot", "boot_all", "cv", "repeatedcv", "LOOCV", "LGOCV" (for repeated training/test splits), "none" (only fits one model to the entire training set), timeslice, "adaptive_cv", "adaptive_boot" or "adaptive_LGOCV", Default: 'repeatedcv'
Percentage of `x` for testing the model in case `y` is NULL, Default: 0.75
number of folds or of resampling iterations, Default: 10
Number of repeats for repeated k fold cross validation, Default: 3
Logical; if TRUE returns an roc curve for model accuracy, Default: FALSE
Number of the column containing sex 'M' for male and 'F' for female, Default: 1
Number of the column containing populations' names, Default: NULL
Logical; if TRUE returns the accuracy in different populations of the new data frame, Default: FALSE.
reference category in the grouping factor, Default: 'F'
positive category in the grouping factor, Default: 'M'
additional arguments that can passed to modeling, confusionMatrix function and roc curve generated by plot_roc.
Data frames to be entered as input need to be arranged in a similar manner to [Howells] dataset. The "cut point" is found such that it maximizes the sum of "sensitivity" [TP/(TP+FN)] plus "specificity" [TN/(TN+FP)] where TP is the number of males identified as males, TN is the number of females identified as females, FN is the number of males identified as females, and FP is the number of females identified as males. For methods that employ prior probabilities, they are calculated based on sampling frequencies.
if (FALSE) {
library(TestDimorph)
accu_model(
Sex ~ GOL + NOL + BNL,
x = Howells, y = Howells, plot = FALSE
)
# Using a single dataset
accu_model(
Sex ~ GOL + NOL + BNL,
x = Howells,
method = "lda",
plot = FALSE
)
}
Run the code above in your browser using DataLab