Learn R Programming

ada (version 2.0-5)

pairs.ada: Pairwise Plots and Variable Importancs Plot for Ada

Description

This command produces pairwise plots of the data. The data in the upper panel of pairwise plots colors the observations by observed class membership (if membership is provided). The lower panel of pairwise plots colors the observations by predicted classes. In addition, the plotting symbol is scaled by the the class probability estimate from by adaboost. The varplot command produces a variable importance plot using the improve criteria given in the reference (Hastie et al.,2001, pg332). This is a rather standard measure for determining variable importance.

Usage

"pairs"(x, train.data = NULL, vars = NULL, maxvar = 10, test.x = NULL, test.y = NULL, test.only = FALSE,col=c(2,4),pch=c(1,2), ...)
varplot(x, plot.it = TRUE, type = c("none","scores"),max.var.show=30, ...)

Arguments

x
object generated by ‘ada’.
train.data
the ‘data.frame’ of the orgianal data used to train the classifier. The names of this ‘data.frame’ must be the same as the variable names as the object generated by ‘ada’. x.data is used by both the ‘pairs’ command. Default = NULL.
vars
a vector of variables to include for this plot. The variable number must correspond to a specific column in ‘x’. For example, vars=c(1,2), generates a plot for the first two columns for ‘x.data’. Note: vars is only used for the ‘pairs’ command. Default = NULL.
maxvar
the maximum number of variables for the pairwise plot. If maxvar = 5, then ‘varplot’ chooses the the five most important variables and places these in desending order in the plot. Maxvar is only used for the ‘pairs’ command. Default = 10.
test.x
an option to plot pairwise descriptors for a test data set. ‘test.data’ should be of type ‘data.frame’. ‘test.data’ is only used for the ‘pairs’ command. Default = NULL.
test.y
the corresponding response for the test data set. If ‘test.response’ is not specified, then the color of the symbols for the test data in the pairwise plots are black; training data are colored by class. ‘test.response’ is only used for the ‘pairs’ command. Default = NULL.
test.only
provides pairwise plots for test data only (test.only = TRUE). Default = FALSE. If ‘test.response’ is not specified, then ‘test.only’ is ignored. ‘test.only’ is only used for the ‘pairs’ command. Default = NULL.
col
color for plot symbols one for each class. Defualt col=c(2,4) (i.e. red and blue)
pch
pch for plot set two symbols. Defualt pch=c(1,2) (i.e. circle and triangle)
...
Arguments to be passed into ‘pairs.default’. Do not set the upper and lower panel. This is only used for the pairs command.
plot.it
provides a plot of frequencies for each variable (plot.it = TRUE). ‘plot.it’ is only used for the ‘varplot’ command. Default = NULL.
type
if type=“none” then nothing is returned. Default = “none”. If type=“scores”, the frequencies are returned.
max.var.show
if plot.it is TRUE then this controls the number of variables shown for the plot

Value

scores
If type=“scores” then the frequencies for each variable is returned by the varplot command.

Details

The ‘varplot’ command provides a sense of variable importance--the more frequently a variable is selected for boosting, the more likely the variable contains useful information for classification. Pairwise interactions of important variables can then be visualized using ‘varplot’. Note: The ‘pairs’ command calls the ‘varplot’ command.

References

Culp, M., Johnson, K., Michailidis, G. (200X). ada: an R Package for Boosting Journal of Statistical Software, (XX)XX