Learn R Programming

ggRandomForests: Visually Exploring Random Forests

ggRandomForests will help uncover variable associations in the random forests models. The package is designed for use with the randomForest package (A. Liaw and M. Wiener 2002) or the randomForestSRC package (Ishwaran et.al. 2014, 2008, 2007) for survival, regression and classification random forests and uses the ggplot2 package (Wickham 2009) for plotting diagnostic and variable association results. ggRandomForests is structured to extract data objects from randomForestSRC or randomForest objects and provides S3 functions for printing and plotting these objects.

The randomForestSRC package provides a unified treatment of Breiman's (2001) random forests for a variety of data settings. Regression and classification forests are grown when the response is numeric or categorical (factor) while survival and competing risk forests (Ishwaran et al. 2008, 2012) are grown for right-censored survival data. Recently, support for the randomForest package (A. Liaw and M. Wiener 2002) for regression and classification forests has also been added.

Many of the figures created by the ggRandomForests package are also available directly from within the randomForestSRC or randomForest package. However, ggRandomForests offers the following advantages:

  • Separation of data and figures: ggRandomForests contains functions that operate on either the forest object directly, or on the output from randomForestSRC and randomForest post processing functions (i.e. plot.variable, var.select, find.interaction) to generate intermediate ggRandomForests data objects. S3 functions are provide to further process these objects and plot results using the ggplot2 graphics package. Alternatively, users can use these data objects for additional custom plotting or analysis operations.

  • Each data object/figure is a single, self contained object. This allows simple modification and manipulation of the data or ggplot2 objects to meet users specific needs and requirements.

  • The use of ggplot2 for plotting. We chose to use the ggplot2 package for our figures to allow users flexibility in modifying the figures to their liking. Each S3 plot function returns either a single ggplot2 object, or a list of ggplot2 objects, allowing users to use additional ggplot2 functions or themes to modify and customize the figures to their liking.

The package has recently been extended for Breiman and Cutler's Random Forests for Classification and Regression package randomForest where possible. Though methods have been provided for all gg_* functions, the unsupported functions will return an error message indicating where support is still lacking.

References

Breiman, L. (2001). Random forests, Machine Learning, 45:5-32.

Ishwaran H. and Kogalur U.B. (2014). Random Forests for Survival, Regression and Classification (RF-SRC), R package version 1.5.5.

Ishwaran H. and Kogalur U.B. (2007). Random survival forests for R. R News 7(2), 25--31.

Ishwaran H., Kogalur U.B., Blackstone E.H. and Lauer M.S. (2008). Random survival forests. Ann. Appl. Statist. 2(3), 841--860.

A. Liaw and M. Wiener (2002). Classification and Regression by randomForest. R News 2(3), 18--22.

Wickham, H. ggplot2: elegant graphics for data analysis. Springer New York, 2009.

Copy Link

Version

Install

install.packages('ggRandomForests')

Monthly Downloads

947

Version

2.2.1

License

GPL (>= 3)

Issues

Pull Requests

Stars

Forks

Maintainer

Last Published

September 1st, 2022

Functions in ggRandomForests (2.2.1)

gg_interaction

Minimal Depth Variable Interaction data object (find.interaction).
gg_error

randomForest error rate data object
calc_auc

Area Under the ROC Curve calculator
combine.gg_partial

combine two gg_partial objects
ggRandomForests-package

ggRandomForests: Visually Exploring Random Forests
calc_roc.rfsrc

Receiver Operator Characteristic calculator
gg_minimal_depth

Minimal depth data object ([randomForestSRC]{var.select})
gg_vimp

Variable Importance (VIMP) data object
gg_rfsrc.rfsrc

Predicted response data object
plot.gg_error

Plot a gg_error object
nelson

nonparametric Nelson-Aalen estimates
plot.gg_variable

Plot a gg_variable object,
gg_roc.rfsrc

ROC (Receiver operator curve) data from a classification random forest.
plot.gg_vimp

Plot a gg_vimp object, extracted variable importance of a rfsrc object
gg_minimal_vimp

Minimal depth vs VIMP comparison by variable rankings.
gg_partial

Partial variable dependence object
gg_partial_coplot.rfsrc

Data structures for stratified partial coplots
plot.gg_minimal_vimp

Plot a gg_minimal_vimp object for comparing the Minimal Depth and VIMP variable rankings.
plot.gg_partial

Partial variable dependence plot, operates on a gg_partial object.
plot.gg_minimal_depth

Plot a gg_minimal_depth object for random forest variable ranking.
plot.gg_interaction

plot.gg_interaction Plot a gg_interaction object,
plot.gg_partial_list

Partial variable dependence plot, operates on a gg_partial_list object.
plot.gg_rfsrc

Predicted response plot from a gg_rfsrc object.
gg_survival

Nonparametric survival estimates.
plot.gg_roc

ROC plot generic function for a gg_roc object.
kaplan

nonparametric Kaplan-Meier estimates
print.gg_minimal_depth

Print a gg_minimal_depth object.
quantile_pts

Find points evenly distributed along the vectors values.
shift

lead function to shift by one (or more).
surface_matrix

Construct a set of (x, y, z) matrices for surface plotting a gg_partial_coplot object
gg_variable

Marginal variable dependence data object.
plot.gg_survival

Plot a gg_survival object.