tunePcaLda: Build a classifier with parameter tuning.

Description

optimize the number of principal component to be used in LDA based on a cross-validation procedure.

Usage

tunePcaLda(data, label, batch = NULL, nPC = 1:50, 
            optMerit = c("Accuracy", "Sensitivity")[2], 
            maximize = TRUE, 
            cv = c("CV", "BV")[2], 
            nPart = 10, ...)

Arguments

data

a data matrix, with samples saved in rows and features in columns.

label

a vector of response variables (i.e., group/concentration info), must be the same length as the number of samples.

batch

a vector of batch variables (i.e., batch/patient ID), must be given in case of cv='BV'. Ideally, this should be the identification of the samples at the highest hierarchy (e.g., the patient ID rather than the spectral ID). Ignored for cv='CV'.

nPC

a vector of integers, the candidate numbers of principal components to be used for LDA, out of which an optimal value will be selected.

optMerit

a character value, the name of the merit to be optimized. The mean sensitivity will be optimized if optMerit = "Sensitivity".

maximize

a boolean value, if or not maximize the merit.

a character value, specifying the type of cross-validation.

nPart

an integer, the number of folds to be split for cross-validation. Equivelant to nFold of crossValidation for cv='CV' and to nBatch for cv='BV'. (NOTE: use nPart=0 for leave-one-batch out cross-validaiton).

…

parameters for crossValidation

Value

A list of elements:

PCA

PCA model

LDA

LDA model built with the optimal number of principal components

nPC

the optimal number of principal components

Details

build a classifier using each value in nPC, of which the performance is evaluated with a normal k-fold or batch-wise cross-validation. The optimal number is selected as the one giving the maximal (maximize=TRUE) or minimal (maximize=FALSE) merit.

A two-layer cross-validation can be performed by using tunePcaLda as the method in crossValidation.

References

S. Guo, T. Bocklitz, et al., Common mistakes in cross-validating classification models. Analytical methods 2017, 9 (30): 4410-4417.

Examples

Run this code

# NOT RUN {
  data(DATA)
  ### perform parameter tuning with a 3-fold cross-validaiton
  RES2 <- tunePcaLda(data=DATA$spec
                   ,label=DATA$labels
                   ,batch=DATA$batch
                   ,nPC=2:4
                   ,cv=c('CV', 'BV')[1]
                   ,nPart=3
                   ,optMerit=c('Accuracy', 'Sensitivity')[2]
                   ,center=TRUE
                   ,scale=FALSE)
# }

Run the code above in your browser using DataLab