DACrossVal: Cross Validation for Discriminant Analysis Classification Rules

Description

DACrossVal evaluates the performance of a Discriminant Analysis training sample algorithm by k-fold Cross-Validation.

Usage

DACrossVal(data, grouping, TrainAlg, EvalAlg=EvalClrule, 
Strfolds=TRUE, kfold=10, CVrep=20, prior="proportions", ...)

Arguments

data

Matrix, data frame or Interval Data object of observations.

grouping

Factor specifying the class for each observation.

TrainAlg

A function with the training algorithm. It should return an object that can be used as input to the argument of EValAlg.

EvalAlg

A function with the evaluation algorithm. By default set to EvalClrule which returns a list with components err (estimates of error rates by class) and Nk (number of out-sample observations by class). Thi

Strfolds

Boolean flag indicating if the folds should be stratified according to the original class proportions (default), or randomly generated from the whole training sample, ignoring class membership.

kfold

Number of training sample folds to be created in each replication.

CVrep

Number of replications to be performed.

prior

The prior probabilities of class membership. If unspecified, the class proportions for the training set are used. If present, the probabilities should be specified in the order of the factor levels.

...

Further arguments to be passed to TrainAlg and EvalAlg.

Value

A three dimensional array with the number of tested observations, and estimated classification errors for each combination of fold and replication tried. The array dimensions are defined as follows: The first dimension runs through the different fold-replication combinations. The second dimension represents the classes. The third dimension has two named levels representing respectively the number of observations tested (Nk), and the estimated classification errors (Clerr).

Examples

Run this code

# Compare performance of linear and quadratic discriminant analysis with 
#  Configurations C1 and c4 on the ChinaT data set by 5-fold cross-validation 
#  replicated twice

# Create an Interval-Data object containing the intervals for 899 observations 
# on the temperatures by quarter in 60 Chinese meteorological stations.

ChinaT <- IData(ChinaTemp[1:8])

# Classical (configuration 1) Linear Discriminant Analysis 

CVldaC1 <- DACrossVal(ChinaT,ChinaTemp$GeoReg,TrainAlg=lda,Config=1,kfold=5,CVrep=2)
summary(CVldaC1[,,"Clerr"])
glberrors <- 
	apply(CVldaC1[,,"Nk"]*CVldaC1[,,"Clerr"],1,sum)/apply(CVldaC1[,,"Nk"],1,sum)
cat("Average global classification error =",mean(glberrors),"")

# Linear Discriminant Analysis with configuration 4

CVldaC4 <- DACrossVal(ChinaT,ChinaTemp$GeoReg,TrainAlg=lda,Config=4,kfold=5,CVrep=2)
summary(CVldaC4[,,"Clerr"])
glberrors <- 
	apply(CVldaC4[,,"Nk"]*CVldaC4[,,"Clerr"],1,sum)/apply(CVldaC4[,,"Nk"],1,sum)
cat("Average global classification error =",mean(glberrors),"")

# Classical (configuration 1) Quadratic Discriminant Analysis 

CVqdaC1 <- DACrossVal(ChinaT,ChinaTemp$GeoReg,TrainAlg=qda,Config=1,kfold=5,CVrep=2)
summary(CVqdaC1[,,"Clerr"])
glberrors <- 
	apply(CVqdaC1[,,"Nk"]*CVqdaC1[,,"Clerr"],1,sum)/apply(CVqdaC1[,,"Nk"],1,sum)
cat("Average global classification error =",mean(glberrors),"")

# Quadratic Discriminant Analysis with configuration 4

CVqdaC4 <- DACrossVal(ChinaT,ChinaTemp$GeoReg,TrainAlg=qda,Config=4,kfold=5,CVrep=2)
summary(CVqdaC4[,,"Clerr"])
glberrors <- 
	apply(CVqdaC4[,,"Nk"]*CVqdaC4[,,"Clerr"],1,sum)/apply(CVqdaC4[,,"Nk"],1,sum)
cat("Average global classification error =",mean(glberrors),"")

Run the code above in your browser using DataLab

Description

Usage

Arguments

Value

See Also

Examples