rfAttrEval: Attribute evaluation with random forest

Description

The method evaluates the quality of the features/attributes/dependent variables used in the given random forest model.

Usage

rfAttrEval(model) 
rfAttrEvalClustering(model, dataset, clustering=NULL)

Arguments

model

The model of type rf or rfNear as returned by CoreModel.

dataset

Training instances that produced random forest model.

clustering

A clustering vector of dataset training instances used in model.

Value

In case of rfAttrEval a vector of evaluations for the features in the order specified by the formula used to generate the provided model. In case of rfAttrEvalClustering a matrix is returned, where each row contains evaluations for one of the clusters.

Details

The attributes are evaluated via provided random forest's out-of-bag sets. Values for each attribute in turn are randomly shuffled and classified with random forest. The difference between average margin of non-shuffled and shuffled instances serves as a quality estimate of the attribute. The function rfAttrEvalClustering uses a clustering of the training instances to produce importance score of attributes for each cluster separately. If parameter clustering is set to NULL the actual class values of the instances are used as clusters thereby producing the evaluation of attributes specific for each of the class values.

References

Marko Robnik-Sikonja: Improving Random Forests. In J.-F. Boulicaut et al.(Eds): ECML 2004, LNAI 3210, Springer, Berlin, 2004, pp. 359-370 Available also from http://lkm.fri.uni-lj.si/rmarko/papers/

Leo Breiman: Random Forests. Machine Learning Journal, 2001, 45, 5-32

Examples

Run this code

# NOT RUN {
# build random forests model with certain parameters
modelRF <- CoreModel(Species ~ ., iris, model="rf", 
              selectionEstimator="MDL", minNodeWeightRF=5, 
              rfNoTrees=100, maxThreads=1)
rfAttrEval(modelRF) # feature evaluations

x <- rfAttrEval(modelRF) # feature evaluations for each class
print(x)

destroyModels(modelRF) # clean up


# }

Run the code above in your browser using DataLab