Learn R Programming

ggRandomForests (version 2.2.0)

partial_surface_data: Cached plot.variable objects for examples, diagnostics and vignettes. Data sets storing plot.variable objects corresponding to training data according to the following naming convention:
  • partial_boston_surf - from a randomForestS[R]C for the Boston housing data set (MASS package).

  • partial_pbc_surf - from a randomForest[S]RC for the pbc data set (randomForestSRC package)

  • partial_pbc_time - from a randomForest[S]RC for the pbc data set (randomForestSRC package)

Description

Cached plot.variable objects for examples, diagnostics and vignettes.

Data sets storing plot.variable objects corresponding to training data according to the following naming convention:

  • partial_boston_surf - from a randomForestS[R]C for the Boston housing data set (MASS package).

  • partial_pbc_surf - from a randomForest[S]RC for the pbc data set (randomForestSRC package)

  • partial_pbc_time - from a randomForest[S]RC for the pbc data set (randomForestSRC package)

Arguments

Format

list of plot.variable objects

Details

Constructing partial plot data with the randomForestSRC::plot.variable function are computationally expensive. We cache plot.variable objects to improve the ggRandomForests examples, diagnostics and vignettes run times. (see cache_rfsrc_datasets to rebuild a complete set of these data sets.)

For each data set listed, we build a rfsrc (see rfsrc_data), then calculate the partial plot data with plot.variable function, setting partial=TRUE. Each data set is built with the cache_rfsrc_datasets with the randomForestSRC version listed in the ggRandomForests DESCRIPTION file.

  • partial_boston - The Boston housing values in suburbs of Boston from the MASS package. Build a regression random forest for predicting medv (median home values) on 13 covariates and 506 observations.

  • partial_pbc - The pbc data from the Mayo Clinic trial in primary biliary cirrhosis (PBC) of the liver conducted between 1974 and 1984. A total of 424 PBC patients, referred to Mayo Clinic during that ten-year interval, met eligibility criteria for the randomized placebo controlled trial of the drug D-penicillamine. 312 cases participated in the randomized trial and contain largely complete data. Data from the randomForestSRC package. Build a survival random forest for time-to-event death data with 17 covariates and 312 observations (remaining 106 observations are held out).

References

#--------------------- randomForestSRC ---------------------

Ishwaran H. and Kogalur U.B. (2014). Random Forests for Survival, Regression and Classification (RF-SRC), R package version 1.5.5.

Ishwaran H. and Kogalur U.B. (2007). Random survival forests for R. R News 7(2), 25-31.

Ishwaran H., Kogalur U.B., Blackstone E.H. and Lauer M.S. (2008). Random survival forests. Ann. Appl. Statist. 2(3), 841-860.

#--------------------- Boston data set ---------------------

Belsley, D.A., E. Kuh, and R.E. Welsch. 1980. Regression Diagnostics. Identifying Influential Data and Sources of Collinearity. New York: Wiley.

Harrison, D., and D.L. Rubinfeld. 1978. "Hedonic Prices and the Demand for Clean Air." J. Environ. Economics and Management 5: 81-102.

#--------------------- pbc data set ---------------------

Fleming T.R and Harrington D.P., (1991) Counting Processes and Survival Analysis. New York: Wiley.

T Therneau and P Grambsch (2000), Modeling Survival Data: Extending the Cox Model, Springer-Verlag, New York. ISBN: 0-387-98784-3.

See Also

Boston pbc plot.variable rfsrc_data cache_rfsrc_datasets gg_partial plot.gg_partial

Examples

Run this code
if (FALSE) {
#---------------------------------------------------------------------
# MASS::Boston data - regression random forest 
#---------------------------------------------------------------------
# load the rfsrc object from the cached data
data(rfsrc_boston, package="ggRandomForests")

# The plot.variable call
partial_boston <- plot.variable(rfsrc_boston,
                                partial=TRUE, show.plots = FALSE )

# plot the forest partial plots
gg_dta <- gg_partial(partial_boston)
plot(gg_dta, panel=TRUE)

#---------------------------------------------------------------------
# randomForestSRC::pbc data - survival random forest
#---------------------------------------------------------------------
# load the rfsrc object from the cached data
data(rfsrc_pbc, package="ggRandomForests")

# Restrict the time of interest to less than 5 years.
time_pts <- rfsrc_pbc$time.interest[which(rfsrc_pbc$time.interest<=5)]

# Find the 50 points in time, evenly space along the distribution of 
# event times for a series of partial dependence curves
time_cts <-quantile_pts(time_pts, groups = 50)

# Generate the gg_partial_coplot data object
system.time(partial_pbc_time <- lapply(time_cts, function(ct) {
   plot.variable(rfsrc_pbc, xvar = "bili", time = ct,
                 npts = 50, show.plots = FALSE, 
                 partial = TRUE, surv.type="surv")
   }))
#     user   system  elapsed 
# 2561.313   81.446 2641.707 

# Find the quantile points to create 50 cut points
alb_partial_pts <-quantile_pts(rfsrc_pbc$xvar$albumin, groups = 50)

system.time(partial_pbc_surf <- lapply(alb_partial_pts, function(ct) {
  rfsrc_pbc$xvar$albumin <- ct
  plot.variable(rfsrc_pbc, xvar = "bili", time = 1,
                npts = 50, show.plots = FALSE, 
                partial = TRUE, surv.type="surv")
  }))
# user   system  elapsed 
# 2547.482   91.978 2671.870 

}

Run the code above in your browser using DataLab