vcrossv.all: V-fold iterative cross-validation for discriminant analysis

Description

This function v-fold cross-validates a discriminant analysis through the leave-v-out procedure, with v varying from 1 to v. It also does repetitions of the cross-validation at each value of v to make estimates of the confidence limits for the accuracy of the function. This function involves very intensive computations. Therefore, if only specific values of v need to be evaluated, it is recommended to use vcrossv.da instead.

Usage

vcrossv.all(x, f, to, nsimulat, funct, ntrials, plot = TRUE)

Arguments

A matrix with samples in columns and taxa in rows. The rows must be named after taxa names (see rownames).

An object of class factor containing the discriminant factor (See Venables & Ripley (2002) for details on discriminant analysis).

The upper value of v. The v-fold crossvalidation is performed for each value from 1 to v.

nsimulat

Number of samples simulated to desaturate the model (see Correa-Metrio et al (2010) for details). If no samples were simulated nsimulat=1.

funct

lda for linear discriminant analysis, and qda for quadratic discriminant analysis.

ntrials

Number of desired repetitions for the cross-validation at each value of v.

plot

Whether or not a plot of the behavior of the accuracy estimated for the discriminant function at each value of v is desired.

Value

vcrossv.all returns a matrix with four columns. fold contains the values of v. mean accuracy contains the average discriminant function accuracy obtained from repeating the cross-validation ntrials times at the given value of v. lower (0.025) and upper (0.975) contain the 0.025 and 0.975 quantiles of the discriminant function accuracy obtained from the same procedure. Note that for v=1 the results are the same for all repetitions given that leaving only one element out has no random component associated.

Details

The function was designed for discrimination of pollen taxa into dichotomous ecological groups (only admits two factors). The prior information corresponds to the affinity of certain taxa to known environmental conditions. Therefore, while the taxa corrrespond to the objects to classify, the percentages through the fossil dataset correspond to the attributes. Each time the discriminant function is adjusted, v elements are left out with no replacements. Therefore, it is recommended that v be smaller than half of the total taxa, unless there is a considerable number of species. Take also into consideration that each time a taxon is left out for the crossvalidation, all the samples that were simulated for such taxon are left out too.

References

Correa-Metrio, A., K.R. Cabrera, and M.B. Bush. 2010. Quantifying ecological change through discriminant analysis: a paleoecological example from the Peruvian Amazon. Journal of Vegetation Science 21: 695-704.

Venables, W.N., and B.D. Ripley. 2002. "Modern applied statistics with S". Springer, New York.

Examples

Run this code

data(quexilper)
# Taking only a fraction of the data base so the model is not saturated
a<-quexilper[1:10,1:20]
a<-t(a)
#build a dummy factor assuming that the first 10 species belong to group1 and the send ten belong to group 2
b<-as.factor(rep(c("group1","group2"),each=10))
#apply the function
vcrossv.all(a,b,to=5,nsimulat=1,funct=lda,ntrials=20,plot=TRUE)

Run the code above in your browser using DataLab