plot.CoreModel: Visualization of CoreModel models

Description

The method plot visualizes the models returned by CoreModel() function or summaries obtained by applying these models to data. Different plots can be produced depending on the type of the model.

Usage

# S3 method for CoreModel
plot(x, trainSet, rfGraphType=c("attrEval", "outliers", "scaling",
    "prototypes", "attrEvalCluster"), clustering=NULL, ...)

Arguments

The model structure as returned by CoreModel.

trainSet

The data frame containing training data which produced the model x.

rfGraphType

The type of the graph to produce for random forest models. See details.

clustering

The clustering of the training instances used in some model types. See details.

…

Other options controlling graphical output passed to additional graphical functions.

Value

The method returns no value.

Details

The output of function CoreModel is visualized. Depending on the model type, different visualizations are produced. Currently, classification tree, regression tree, and random forests are supported (models "tree", "regTree", "rf", and "rfNear").

For classification and regression trees (models "tree" and "regTree") the visualization produces a graph representing structure of classification and regression tree, respectively. This process exploits graphical capabilities of rpart package. Internal structures of CoreModel are converted to rpart.object and then visualized by calling plot.rpart and text.rpart using some sensible values of graphical parameters. For more versatile picture use getRpartModel and call these two functions with different parameters. An alternative is to use package rpart.plot and plot the rpart.object with it, however note that rpart.plot can only display a single value in a leaf, which is not appropriate for model trees using e.g., linear regression in the leaves. For these cases function display is a better alternative. directly modifying the parameters.

For random forest models (models "rf" and "rfNear") different types of visualizations can be produced depending on the graphType parameter:

"attrEval" the attributes are evaluated with random forest model and the importance scores are then visualized. For details see rfAttrEval.
"attrEvalClustering" similarly to the "attrEval" the attributes are evaluated with random forest model and the importance scores are then visualized, but the importance scores are generated for each cluster separately. The parameter clustering provides clustering information on the trainSet. If clustering parameter is set to NULL, the class values are used as clustering information and visualization of attribute importance for each class separately is generated. For details see rfAttrEvalClustering.
"outliers" the random forest proximity measure of training instances in trainSet is visualized and outliers for each class separately can be detected. For details see rfProximity and rfOutliers.
"prototypes" typical instances are found based on predicted class probabilities and their values are visualized (see classPrototypes).
"scaling" returns a scaling plot of training instances in a two dimensional space using random forest based proximity as the distance (see rfProximity and a scaling function cmdscale).

References

Leo Breiman: Random Forests. Machine Learning Journal, 45:5-32, 2001

Examples

Run this code

# NOT RUN {
# decision tree
dataset <- CO2
md <- CoreModel(Plant ~ ., dataset, model="tree")
plot(md, dataset)

# more versatile graph can be obtained by explicit conversion to rpart.object 
rpm <- getRpartModel(md,dataset)
# and than setting additional graphical parameters in plot.rpart and text.rpart
# E.g., set angle to tan(0.5)=45 (degrees) and length of branches at least 5, 
# try to make a dendrogram more compact
plot(rpm, branch=0.5, minbranch=5, compress=TRUE)
#(pretty=0) full names of attributes, numbers to 3 decimals, 
text(rpm, pretty=0, digits=3)

# an alternative is to use fancier rpart.plot package
# rpart.plot(rpm) # rpart.plot has many parameters controlling the output
# but it cannot plot models in leaves 

destroyModels(md) # clean up

# regression tree
dataset <- CO2
mdr <- CoreModel(uptake ~ ., dataset, model="regTree")
plot(mdr, dataset)
destroyModels(mdr) # clean up

#random forests
dataset <- iris
mdRF <- CoreModel(Species ~ ., dataset, model="rf", rfNoTrees=30, maxThreads=1)
plot(mdRF, dataset, rfGraphType="attrEval")
plot(mdRF, dataset, rfGraphType="outliers")
plot(mdRF, dataset, rfGraphType="scaling")
plot(mdRF, dataset, rfGraphType="prototypes")
plot(mdRF, dataset, rfGraphType="attrEvalCluster", clustering=NULL)
destroyModels(mdRF) # clean up

# }

Run the code above in your browser using DataLab