nlcv: Nested Loop Cross-Validation

Description

This function first proceeds to a feature selection and then applies five different classification algorithms.

Usage

nlcv(eset, classVar = "type", nRuns = 2, propTraining = 2/3,
  classdist = c("balanced", "unbalanced"), nFeatures = c(2, 3, 5, 7, 10, 15,
  20, 25, 30, 35), fsMethod = c("randomForest", "t.test", "limma", "none"),
  classifMethods = c("dlda", "randomForest", "bagg", "pam", "svm"),
  fsPar = NULL, initialGenes = seq(length.out = nrow(eset)),
  geneID = "ID", storeTestScores = FALSE, verbose = FALSE, seed = 123)

Arguments

eset

ExpressionSet object containing the genes to classify

classVar

String giving the name of the variable containing the observed class labels, should be contained in the phenoData of eset

nRuns

Number of runs for the outer loop of the cross-validation

propTraining

Proportion of the observations to be assigned to the training set. By default propTraining = 2/3.

classdist

distribution of classes; allows to indicate whether your distribution is 'balanced' or 'unbalanced'. The sampling strategy for each run is adapted accordingly.

nFeatures

Numeric vector with the number of features to be selected from the features kept by the feature selection method. For each number n specified in this vector the classification algorithms will be run using only the top n features.

fsMethod

Feature selection method; one of "randomForest" (default), "t.test", "limma" or "none".

classifMethods

character vector with the classification methods to be used in the analysis; elements can be chosen among "dlda", "randomForest", "bagg", "pam" "svm", "glm", "lda", "nlda", "dlda", "ksvm". The first 5 methods are selected by default

fsPar

List of further parameters to pass to the feature selection method; currently the default for "randomForest" is an empty list() whereas for "t.test", one can specify the particular test to be used (the default being list(test = "f").

initialGenes

Initial subset of genes in the ExpressionSet on which to apply the nested loop cross validation procedure. By default all genes are selected.

geneID

string representing the name of the gene ID variable in the fData of the expression set to use; this argument was added for people who use e.g. both Entrez IDs and Ensemble gene IDs

storeTestScores

should the test scores be stored in the nlcv object? Defaults to FALSE

verbose

Should the output be verbose (TRUE) or not (FALSE).

seed

integer with seed, set at the start of the cross-validation.

Value

The result is an object of class 'nlcv'. It is a list with two components, output and features.

De output component is a list of five components, one for each classification algorithm used. Each of these components has as many components as there are elements in the nFeatures vector. These components contain both the error rates for each run (component errorRate) and the predicted labels for each run (character matrix labelsMat).

The features list is a list with as many components as there are runs. For each run, a named vector is given with the variable importance measure for each gene. For t test based feature selection, P-values are used; for random forest based feature selection the variable importance measure is given.