xyplot.resamples: Lattice Functions for Visualizing Resampling Results

Description

Lattice and ggplot functions for visualizing resampling results across models

Usage

# S3 method for resamples
xyplot(x, data = NULL, what = "scatter",
  models = NULL, metric = x$metric[1], units = "min", ...)
# S3 method for resamples
parallelplot(x, data = NULL, models = x$models,
  metric = x$metric[1], ...)
# S3 method for resamples
splom(x, data = NULL, variables = "models",
  models = x$models, metric = NULL, panelRange = NULL, ...)
# S3 method for resamples
densityplot(x, data = NULL, models = x$models,
  metric = x$metric, ...)
# S3 method for resamples
bwplot(x, data = NULL, models = x$models,
  metric = x$metric, ...)
# S3 method for resamples
dotplot(x, data = NULL, models = x$models,
  metric = x$metric, conf.level = 0.95, ...)
# S3 method for resamples
ggplot(data = NULL, mapping = NULL,
  environment = NULL, models = data$models, metric = data$metric[1],
  conf.level = 0.95, ...)

Arguments

an object generated by resamples

data

Only used for the ggplot method; an object generated by resamples

what

for xyplot, the type of plot. Valid options are: "scatter" (for a plot of the resampled results between two models), "BlandAltman" (a Bland-Altman, aka MA plot between two models), "tTime" (for the total time to run train versus the metric), "mTime" (for the time to build the final model) or "pTime" (the time to predict samples - see the timingSamps options in trainControl, rfeControl, or sbfControl)

models

a character string for which models to plot. Note: xyplot requires one or two models whereas the other methods can plot more than two.

metric

a character string for which metrics to use as conditioning variables in the plot. splom requires exactly one metric when variables = "models" and at least two when variables = "metrics".

units

either "sec", "min" or "hour"; which what is either "tTime", "mTime" or "pTime", how should the timings be scaled?

…

further arguments to pass to either histogram, densityplot, xyplot, dotplot or splom

variables

either "models" or "metrics"; which variable should be treated as the scatter plot variables?

panelRange

a common range for the panels. If NULL, the panel ranges are derived from the values across all the models

conf.level

the confidence level for intervals about the mean (obtained using t.test)

mapping, environment

Not used.

Value

a lattice object

Details

The ideas and methods here are based on Hothorn et al. (2005) and Eugster et al. (2008).

dotplot and ggplot plots the average performance value (with two-sided confidence limits) for each model and metric.

densityplot and bwplot display univariate visualizations of the resampling distributions while splom shows the pair-wise relationships.

References

Hothorn et al. The design and analysis of benchmark experiments. Journal of Computational and Graphical Statistics (2005) vol. 14 (3) pp. 675-699

Eugster et al. Exploratory and inferential analysis of benchmark experiments. Ludwigs-Maximilians-Universitat Munchen, Department of Statistics, Tech. Rep (2008) vol. 30

Examples

Run this code

# NOT RUN {
# }
# NOT RUN {
#load(url("http://topepo.github.io/caret/exampleModels.RData"))

resamps <- resamples(list(CART = rpartFit,
                          CondInfTree = ctreeFit,
                          MARS = earthFit))

dotplot(resamps,
        scales =list(x = list(relation = "free")),
        between = list(x = 2))

bwplot(resamps,
       metric = "RMSE")

densityplot(resamps,
            auto.key = list(columns = 3),
            pch = "|")

xyplot(resamps,
       models = c("CART", "MARS"),
       metric = "RMSE")

splom(resamps, metric = "RMSE")
splom(resamps, variables = "metrics")

parallelplot(resamps, metric = "RMSE")


# }
# NOT RUN {
# }

Run the code above in your browser using DataLab