Learn R Programming

Biocomb (version 0.4)

select.forward.wrapper: Select the subset of features

Description

This function selects the subset of features using the wrapper method with decision tree algorithm and forward search strategy. It can handle both numerical and nominal values. The wrapper method makes use of the classification algorithm in order to estimate the quality measure of the feature subset. The method uses the built-in cross-validation procedure to estimate the accuracy of classification for the feature subset. At the first step of the method the one-feature subset is selected according to the quality measure. In the following steps the subset is incrementally extended according to the forward search strategy until the stopping criterion is met. The result is in the form of character vector with the names of the selected features. This function is used internally to perform the classification with feature selection using the function “classifier.loop” with argument “CorrSF” for feature selection.

Usage

select.forward.wrapper(dattable)

Arguments

dattable

a dataset, a matrix of feature values for several cases, the last column is for the class labels. Class labels could be numerical or character values. The maximal number of classes is ten.

Value

The data can be provided with reasonable number of missing values that must be at first preprocessed with one of the imputing methods in the function input_miss.

A returned value is

subset

a character vector of the names of selected features

Details

This function's main job is to select the subset of informative features according to forward selection strategy using the wrapper method. The decision tree is used as the classifier to estimate the quality of the feature subset. See the “Value” section to this page for more details.

Data can be provided in matrix form, where the rows correspond to cases with feature values and class label. The columns contain the values of individual features and the last column must contain class labels. The maximal number of class labels equals 10. The class label features and all the nominal features must be defined as factors.

References

Y. Wang, I.V. Tetko, M.A. Hall, E. Frank, A. Facius, K.F.X. Mayer, and H.W. Mewes, "Gene Selection from Microarray Data for Cancer Classification<U+2014>A Machine Learning Approach," Computational Biology and Chemistry, vol. 29, no. 1, pp. 37-46, 2005.

See Also

input_miss, select.process

Examples

Run this code
# NOT RUN {
# example for dataset without missing values
data(data_test)

# class label must be factor
data_test[,ncol(data_test)]<-as.factor(data_test[,ncol(data_test)])

out=select.forward.wrapper(dattable=data_test)
# }

Run the code above in your browser using DataLab