gbmCMA: Tree-based Gradient Boosting

Description

Roughly speaking, Boosting combines 'weak learners' in a weighted manner in a stronger ensemble. This method calls the function gbm.fit from the package gbm. The 'weak learners' are simple trees that need only very few splits (default: 1).

For S4 method information, see gbmCMA-methods.

Usage

gbmCMA(X, y, f, learnind, models=FALSE,...)

Arguments

Gene expression data. Can be one of the following:

A matrix. Rows correspond to observations, columns to variables.
A data.frame, when f is not missing (s. below).
An object of class ExpressionSet.

Class labels. Can be one of the following:

A numeric vector.
A factor.
A character if X is an ExpressionSet that specifies the phenotype variable.
missing, if X is a data.frame and a proper formula f is provided.

WARNING: The class labels will be re-coded to range from 0 to K-1, where K is the total number of different classes in the learning set.

A two-sided formula, if X is a data.frame. The left part correspond to class labels, the right to variables.

learnind

An index vector specifying the observations that belong to the learning set. May be missing; in that case, the learning set consists of all observations and predictions are made on the learning set.

models

a logical value indicating whether the model object shall be returned

...

Further arguments passed to the function gbm.fit from the package of the same name. Worth mentionning are

ntrees: Number of trees to fit (size of the ensemble), defaults to 100. This parameter should be optimized using tune.
shrinkage: The learning rate (default is 0.001). Usually fixed to a very low value.
distribution: Loss function to be used. Default is "bernoulli", i.e. LogitBoost, a (less robust) alternative is "adaboost".
interaction.depth: Number of splits used by the 'weak learner' (single decision tree). Default is 1.

Value

cloutput.

References

Ridgeway, G. (1999).

The state of boosting.

Computing Science and Statistics, 31:172-181

Friedman, J. (2001).

Greedy Function Approximation: A Gradient Boosting Machine.

Annals of Statistics 29(5):1189-1232.

Examples

Run this code

### load Golub AML/ALL data
data(golub)
### extract class labels
golubY <- golub[,1]
### extract gene expression
golubX <- as.matrix(golub[,-1])
### select learningset
ratio <- 2/3
set.seed(111)
learnind <- sample(length(golubY), size=floor(ratio*length(golubY)))
### run tree-based gradient boosting (no tuning)
gbmresult <- gbmCMA(X=golubX, y=golubY, learnind=learnind, n.trees = 500)
show(gbmresult)
ftable(gbmresult)
plot(gbmresult)

Run the code above in your browser using DataLab