accu_model: Evaluation Of Sex prediction Accuracy

Description

Testing, cross validation and visualization of the accuracy of different sex prediction models using the confusionMatrix and roc curves.

Usage

accu_model(
  f,
  x,
  y = NULL,
  method = "lda",
  res_method = "repeatedcv",
  p = 0.75,
  nf = 10,
  nr = 3,
  plot = FALSE,
  Sex = 1,
  Pop = NULL,
  byPop = FALSE,
  ref. = "F",
  post. = "M",
  ...
)

Value

Visual and numerical accuracy parameters for the tested model

Arguments

f: Formula in the form `groups ~ x1 + x2 + ...`. The grouping factor is placed to the left hand side while the numerical measurements are placed to the right hand side
x: Data frame to be fitted to the model
y: New data frame to be tested, if `NULL` `x` is split to test and training datasets, Default: NULL
method: A string specifying which classification or regression model to use. For list of supported methods see models.
res_method: The resampling method: "boot", "boot632", "optimism_boot", "boot_all", "cv", "repeatedcv", "LOOCV", "LGOCV" (for repeated training/test splits), "none" (only fits one model to the entire training set), timeslice, "adaptive_cv", "adaptive_boot" or "adaptive_LGOCV", Default: 'repeatedcv'
p: Percentage of `x` for testing the model in case `y` is NULL, Default: 0.75
nf: number of folds or of resampling iterations, Default: 10
nr: Number of repeats for repeated k fold cross validation, Default: 3
plot: Logical; if TRUE returns an roc curve for model accuracy, Default: FALSE
Sex: Number of the column containing sex 'M' for male and 'F' for female, Default: 1
Pop: Number of the column containing populations' names, Default: NULL
byPop: Logical; if TRUE returns the accuracy in different populations of the new data frame, Default: FALSE.
ref.: reference category in the grouping factor, Default: 'F'
post.: positive category in the grouping factor, Default: 'M'
...: additional arguments that can passed to modeling, confusionMatrix function and roc curve generated by plot_roc.

Details

Data frames to be entered as input need to be arranged in a similar manner to [Howells] dataset. The "cut point" is found such that it maximizes the sum of "sensitivity" [TP/(TP+FN)] plus "specificity" [TN/(TN+FP)] where TP is the number of males identified as males, TN is the number of females identified as females, FN is the number of males identified as females, and FP is the number of females identified as males. For methods that employ prior probabilities, they are calculated based on sampling frequencies.

Examples

Run this code

if (FALSE) {
library(TestDimorph)
accu_model(
  Sex ~ GOL + NOL + BNL,
  x = Howells, y = Howells, plot = FALSE
)
# Using a single dataset
accu_model(
  Sex ~ GOL + NOL + BNL,
  x = Howells,
  method = "lda",
  plot = FALSE
)
}

Run the code above in your browser using DataLab