plot.CoreModel: Visualization of CoreModel models

Description

The method plot visualizes the models returned by CoreModel() function or summaries obtained by applying these models to data. Different plots can be produced depending on the type of the model.

Usage

# S3 method for CoreModel
plot(x, trainSet, rfGraphType=c("attrEval", "outliers", "scaling",
    "prototypes", "attrEvalCluster"), clustering=NULL, ...)

Value

The method returns no value.

Arguments

x: The model structure as returned by CoreModel.
trainSet: The data frame containing training data which produced the model x.
rfGraphType: The type of the graph to produce for random forest models. See details.
clustering: The clustering of the training instances used in some model types. See details.
...: Other options controlling graphical output passed to additional graphical functions.

Author

John Adeyanju Alao (initial implementation) and Marko Robnik-Sikonja (integration, improvements)

Details

The output of function CoreModel is visualized. Depending on the model type, different visualizations are produced. Currently, classification tree, regression tree, and random forests are supported (models "tree", "regTree", "rf", and "rfNear").

For classification and regression trees (models "tree" and "regTree") the visualization produces a graph representing structure of classification and regression tree, respectively. This process exploits graphical capabilities of rpart.plot package. Internal structures of CoreModel are converted to rpart.object and then visualized by calling rpart.plot using default parameters. Any additional parameters are passed on to this function. For further control use the getRpartModel function and call the function rpart.plot or plot.rpart with different parameters. Note that rpart.plot can only display a single value in a leaf, which is not appropriate for model trees using e.g., linear regression in the leaves. For these cases function display is a better alternative.

For random forest models (models "rf" and "rfNear") different types of visualizations can be produced depending on the graphType parameter:

"attrEval" the attributes are evaluated with random forest model and the importance scores are then visualized. For details see rfAttrEval.
"attrEvalClustering" similarly to the "attrEval" the attributes are evaluated with random forest model and the importance scores are then visualized, but the importance scores are generated for each cluster separately. The parameter clustering provides clustering information on the trainSet. If clustering parameter is set to NULL, the class values are used as clustering information and visualization of attribute importance for each class separately is generated. For details see rfAttrEvalClustering.
"outliers" the random forest proximity measure of training instances in trainSet is visualized and outliers for each class separately can be detected. For details see rfProximity and rfOutliers.
"prototypes" typical instances are found based on predicted class probabilities and their values are visualized (see classPrototypes).
"scaling" returns a scaling plot of training instances in a two dimensional space using random forest based proximity as the distance (see rfProximity and a scaling function cmdscale).

References

Leo Breiman: Random Forests. Machine Learning Journal, 45:5-32, 2001

Examples

Run this code

# decision tree
dataset <- iris
md <- CoreModel(Species ~ ., dataset, model="tree")
plot(md, dataset) # additional parameters are passed directly to rpart.plot

# Additional visualizations can be obtained by explicit conversion to rpart.object 
#rpm <- getRpartModel(md,dataset)
# and than setting graphical parameters in plot.rpart and text.rpart
#require(rpart)
# E.g., set angle to tan(0.5)=45 (degrees) and length of branches at least 5, 
# try to make a dendrogram more compact
#plot(rpm, branch=0.5, minbranch=5, compress=TRUE)
#(pretty=0) full names of attributes, numbers to 3 decimals, 
#text(rpm, pretty=0, digits=3)

destroyModels(md) # clean up

# regression tree
dataset <- CO2
mdr <- CoreModel(uptake ~ ., dataset, model="regTree")
plot(mdr, dataset)
destroyModels(mdr) # clean up

#random forests
dataset <- iris
mdRF <- CoreModel(Species ~ ., dataset, model="rf", rfNoTrees=30, maxThreads=1)
plot(mdRF, dataset, rfGraphType="attrEval")
plot(mdRF, dataset, rfGraphType="outliers")
plot(mdRF, dataset, rfGraphType="scaling")
plot(mdRF, dataset, rfGraphType="prototypes")
plot(mdRF, dataset, rfGraphType="attrEvalCluster", clustering=NULL)
destroyModels(mdRF) # clean up

Run the code above in your browser using DataLab