Learn R Programming

PredPsych (version 0.4)

LinearDA: Cross-validated Linear Discriminant Analysis

Description

A simple function to perform cross-validated Linear Discriminant Analysis

Usage

LinearDA(Data, classCol, selectedCols, cvType, nTrainFolds,
  ntrainTestFolds, modelTrainFolds, foldSep, CV = FALSE, cvFraction,
  extendedResults = FALSE, SetSeed = TRUE, silent = FALSE,
  NewData = NULL, ...)

Arguments

Data

(dataframe) Data dataframe

classCol

(numeric or string) column number that contains the variable to be predicted

selectedCols

(optional) (numeric or string) all the columns of data that would be used either as predictor or as feature

cvType

(optional) (string) which type of cross-validation scheme to follow; One of the following values:

  • folds = (default) k-fold cross-validation

  • LOSO = Leave-one-subject-out cross-validation

  • holdout = holdout Crossvalidation. Only a portion of data (cvFraction) is used for training.

  • LOTO = Leave-one-trial out cross-validation.

nTrainFolds

= (optional) (parameter for only k-fold cross-validation) No. of folds in which to further divide Training dataset

ntrainTestFolds

= (optional) (parameter for only k-fold cross-validation) No. of folds for training and testing dataset

modelTrainFolds

= (optional) (parameter for only k-fold cross-validation) specific folds from the first train/test split (ntrainTestFolds) to use for training

foldSep

(numeric) (parameter for only Leave-One_subject Out) mandatory column number for Leave-one-subject out cross-validation.

CV

(optional) (logical) perform Cross validation of training dataset? If TRUE, posterior probabilites are present with the model

cvFraction

(optional) (numeric) Fraction of data to keep for training data

extendedResults

(optional) (logical) Return extended results with model and other metrics

SetSeed

(optional) (logical) Whether to setseed or not. use SetSeed to seed the random number generator to get consistent results; set false only for permutation tests

silent

(optional) (logical) whether to print messages or not

NewData

(optional) (dataframe) New Data frame features for which the class membership is requested

...

(optional) additional arguments for the function

Value

Depending upon extendedResults. extendedResults = FALSE outputs Test accuracy accTest of discrimination; extendedResults = TRUE outputs Test accuracy accTest of discrimination, ConfusionMatrixResults Overall cross-validated confusion matrix results,ConfMatrix Confusion matrices and fitLDA the fit cross-validated LDA model. If CV = TRUE , Posterior probabilities are generated and stored in the model.

Details

The function implements Linear Disciminant Analysis, a simple algorithm for classification based analyses .LDA builds a model composed of a number of discriminant functions based on linear combinations of data features that provide the best discrimination between two or more conditions/classes. The aim of the statistical analysis in LDA is thus to combine the data features scores in a way that a single new composite variable, the discriminant function, is produced (for details see Fisher, 1936; Rao, 1948)).

References

Fisher, R. A. (1936). The Use of Multiple Measurements in Taxonomic Problems. Annals of Eugenics, 7(2), 179-188.

Rao, C. (1948). The Utilization of Multiple Measurements in Problems of Biological Classification. In Journal of the Royal Statistical Society. Series B (Methodological) (Vol. 10, pp. 159-203).

Examples

Run this code
# NOT RUN {
# simple model with holdout data partition of 80% and no extended results 
LDAModel <- LinearDA(Data = KinData, classCol = 1, 
selectedCols = c(1,2,12,22,32,42,52,62,72,82,92,102,112),cvType="holdout")
# Output:
#
# Performing Linear Discriminant Analysis
#
#
# Performing holdout Cross-validation
# 
# cvFraction was not specified,
#  Using default value of 0.8 (80%) fraction for training (cvFraction = 0.8)
# 
# Proportion of Test/Train Data was :  0.2470588 
# Predicted
# Actual  1  2
# 1 51 32
# 2 40 45
# [1] "Test holdout Accuracy is 0.57"
# holdout LDA Analysis: 
# cvFraction : 0.8 
# Test Accuracy 0.57
# *Legend:
# cvFraction = Fraction of data to keep for training data 
# Test Accuracy = mean accuracy from the Testing dataset

# alt uses:
# holdout cross-validation with 80% training data
LDAModel <- LinearDA(Data = KinData, classCol = 1,
selectedCols = c(1,2,12,22,32,42,52,62,72,82,92,102,112),
CV=FALSE,cvFraction = 0.8,extendedResults = TRUE,cvType="holdout")

# For a 10 fold cross-validation without outputting messages 
LDAModel <-  LinearDA(Data = KinData, classCol = 1,
selectedCols = c(1,2,12,22,32,42,52,62,72,82,92,102,112),
extendedResults = FALSE,cvType = "folds",nTrainFolds=10,silent = TRUE)

# }

Run the code above in your browser using DataLab