rfCMA: Classification based on Random Forests

Description

Random Forests were proposed by Breiman (2001) and are implemented in the package randomForest.

In this package, they can as well be used to rank variables according to their importance, s. GeneSelection.

For S4 method information, see rfCMA-methods

Usage

rfCMA(X, y, f, learnind, varimp = TRUE, seed = 111, models=FALSE,type=1,scale=FALSE,importance=TRUE, ...)

Arguments

Gene expression data. Can be one of the following:

A matrix. Rows correspond to observations, columns to variables.
A data.frame, when f is not missing (s. below).
An object of class ExpressionSet.

Class labels. Can be one of the following:

A numeric vector.
A factor.
A character if X is an ExpressionSet that specifies the phenotype variable.
missing, if X is a data.frame and a proper formula f is provided.

WARNING: The class labels will be re-coded to range from 0 to K-1, where K is the total number of different classes in the learning set.

A two-sided formula, if X is a data.frame. The left part correspond to class labels, the right to variables.

learnind

An index vector specifying the observations that belong to the learning set. May be missing; in that case, the learning set consists of all observations and predictions are made on the learning set.

varimp

Should additional information for variable selection be provided ? Defaults to TRUE.

seed

Fix Random number generator seed to seed. This is useful to guarantee reproducibility of the results.

models

a logical value indicating whether the model object shall be returned

type

Parameter passed to function importance. Either 1 or 2, specifying the type of importance measure (1=mean decrease in accuracy, 2=mean decrease in node impurity).

scale

Parameter passed to function importance. For permutation based measures, should the measures be divided by their standard errors?

importance

Parameter passed to function randomForest.Should importance of predictors be assessed by permutation?

...

Further arguments to be passed to randomForest from the package of the same name.

Value

varimp, then an object of class clvarseloutput is returned, otherwise an object of class cloutput

References

Breiman, L. (2001)

Random Forest.

Machine Learning, 45:5-32.

Examples

Run this code

 ### load Khan data
data(khan)
### extract class labels
khanY <- khan[,1]
### extract gene expression
khanX <- as.matrix(khan[,-1])
### select learningset
set.seed(111)
learnind <- sample(length(khanY), size=floor(2/3*length(khanY)))
### run random Forest
#rfresult <- rfCMA(X=khanX, y=khanY, learnind=learnind, varimp = FALSE)
### show results
#show(rfresult)
#ftable(rfresult)
#plot(rfresult)

Run the code above in your browser using DataLab