Learn R Programming

TestDimorph (version 0.4.0)

accu_model: Evaluation Of Sex prediction Accuracy

Description

Testing, cross validation and visualization of the accuracy of different sex prediction models using the confusionMatrix and roc curves.

Usage

accu_model(
  f,
  x,
  y = NULL,
  method = "lda",
  res_method = "repeatedcv",
  p = 0.75,
  nf = 10,
  nr = 3,
  plot = FALSE,
  Sex = 1,
  Pop = NULL,
  byPop = FALSE,
  ref. = "F",
  post. = "M",
  ...
)

Arguments

f

Formula in the form `groups ~ x1 + x2 + ...`. The grouping factor is placed to the left hand side while the numerical measurements are placed to the right hand side

x

Data frame to be fitted to the model

y

New data frame to be tested, if `NULL` `x` is split to test and training datasets, Default: NULL

method

A string specifying which classification or regression model to use. For list of supported methods see models.

res_method

The resampling method: "boot", "boot632", "optimism_boot", "boot_all", "cv", "repeatedcv", "LOOCV", "LGOCV" (for repeated training/test splits), "none" (only fits one model to the entire training set), timeslice, "adaptive_cv", "adaptive_boot" or "adaptive_LGOCV", Default: 'repeatedcv'

p

Percentage of `x` for testing the model in case `y` is NULL, Default: 0.75

nf

number of folds or of resampling iterations, Default: 10

nr

Number of repeats for repeated k fold cross validation, Default: 3

plot

Logical; if TRUE returns an roc curve for model accuracy, Default: FALSE

Sex

Number of the column containing sex 'M' for male and 'F' for female, Default: 1

Pop

Number of the column containing populations' names, Default: NULL

byPop

Logical; if TRUE returns the accuracy in different populations of the new data frame, Default: FALSE.

ref.

reference category in the grouping factor, Default: 'F'

post.

positive category in the grouping factor, Default: 'M'

...

additional arguments that can passed to modeling, confusionMatrix function and roc curve generated by plot_roc.

Value

Visual and numerical accuracy parameters for the tested model

Details

Data frames to be entered as input need to be arranged in a similar manner to [Howells] dataset. The "cut point" is found such that it maximizes the sum of "sensitivity" [TP/(TP+FN)] plus "specificity" [TN/(TN+FP)] where TP is the number of males identified as males, TN is the number of females identified as females, FN is the number of males identified as females, and FP is the number of females identified as males. For methods that employ prior probabilities, they are calculated based on sampling frequencies.

Examples

Run this code
# NOT RUN {
library(TestDimorph)
accu_model(
  Sex ~ GOL + NOL + BNL,
  x = Howells, y = Howells, plot = FALSE
)
# Using a single dataset
accu_model(
  Sex ~ GOL + NOL + BNL,
  x = Howells,
  method = "lda",
  plot = FALSE
)
# }

Run the code above in your browser using DataLab