boosting.cv: Runs v-fold cross validation with AdaBoost.M1 or SAMME

Description

The data are divided into v non-overlapping subsets of roughly equal size. Then, boosting is applied on (v-1) of the subsets. Finally, predictions are made for the left out subsets, and the process is repeated for each of the v subsets.

Usage

boosting.cv(formula, data, v = 10, boos = TRUE, mfinal = 100, 
 coeflearn = "Breiman", control, par=FALSE)

Value

An object of class boosting.cv, which is a list with the following components:

class: the class predicted by the ensemble classifier.
confusion: the confusion matrix which compares the real class with the predicted one.
error: returns the average error.

Arguments

formula: a formula, as in the lm function.
data: a data frame in which to interpret the variables named in formula
boos: if TRUE (by default), a bootstrap sample of the training set is drawn using the weights for each observation on that iteration. If FALSE, every observation is used with its weights.
v: An integer, specifying the type of v-fold cross validation. Defaults to 10. If v is set as the number of observations, leave-one-out cross validation is carried out. Besides this, every value between two and the number of observations is valid and means that roughly every v-th observation is left out.
mfinal: an integer, the number of iterations for which boosting is run or the number of trees to use. Defaults to mfinal=100 iterations.
coeflearn: if 'Breiman'(by default), alpha=1/2ln((1-err)/err) is used. If 'Freund' alpha=ln((1-err)/err) is used. In both cases the AdaBoost.M1 algorithm is used and alpha is the weight updating coefficient. On the other hand, if coeflearn is 'Zhu' the SAMME algorithm is implemented with alpha=ln((1-err)/err)+ ln(nclasses-1).
control: options that control details of the rpart algorithm. See rpart.control for more details.
par: if TRUE, the cross validation process is runned in parallel. If FALSE (by default), the function runs without parallelization.

Author

Esteban Alfaro-Cortes Esteban.Alfaro@uclm.es, Matias Gamez-Martinez Matias.Gamez@uclm.es and Noelia Garcia-Rubio Noelia.Garcia@uclm.es

References

Alfaro, E., Gamez, M. and Garcia, N. (2013): ``adabag: An R Package for Classification with Boosting and Bagging''. Journal of Statistical Software, Vol 54, 2, pp. 1--35.

Alfaro, E., Garcia, N., Gamez, M. and Elizondo, D. (2008): ``Bankruptcy forecasting: An empirical comparison of AdaBoost and neural networks''. Decision Support Systems, 45, pp. 110--122.

Breiman, L. (1998): "Arcing classifiers". The Annals of Statistics, Vol 26, 3, pp. 801--849.

Freund, Y. and Schapire, R.E. (1996): "Experiments with a new boosting algorithm". In Proceedings of the Thirteenth International Conference on Machine Learning, pp. 148--156, Morgan Kaufmann.

Zhu, J., Zou, H., Rosset, S. and Hastie, T. (2009): ``Multi-class AdaBoost''. Statistics and Its Interface, 2, pp. 349--360.

Examples

Run this code


## rpart library should be loaded
data(iris)
iris.boostcv <- boosting.cv(Species ~ ., v=2, data=iris, mfinal=5, 
control=rpart.control(cp=0.01))
iris.boostcv[-1]

## rpart and mlbench libraries should be loaded
## Data Vehicle (four classes) 
#This example has been hidden to fulfill execution time <5s 
#data(Vehicle)
#Vehicle.boost.cv <- boosting.cv(Class ~.,data=Vehicle,v=5, mfinal=10, coeflearn="Zhu",
#control=rpart.control(maxdepth=5))
#Vehicle.boost.cv[-1]

Run the code above in your browser using DataLab