plot: Plot effect estimates of boosting models

Description

Plot coefficient plots for glmboost models and partial effect plots for all other mboost models.

Usage

# S3 method for glmboost
plot(x, main = deparse(x$call), col = NULL,
     off2int = FALSE, ...)
# S3 method for mboost
plot(x, which = NULL, newdata = NULL,
     type = "b", rug = TRUE, rugcol = "black",
     ylim = NULL, xlab = NULL, ylab = expression(f[partial]),
     add = FALSE, ...)
# S3 method for mboost
lines(x, which = NULL, type = "l", rug = FALSE, ...)

Value

A plot of the fitted model.

Arguments

x: object of class glmboost or an object inheriting from mboost for plotting.
main: a title for the plot.
col: (a vector of) colors for plotting the lines representing the coefficient paths.
off2int: logical indicating whether the offset should be added to the intercept (if there is any) or if the offset is neglected for plotting (default).
which: a subset of base-learners used for plotting. If which is given (as an integer vector or characters corresponding to base-learners) only the corresponding partial effect plots are depicted. Per default all selected base-learners are plotted.
newdata: optionally, a data frame in which to look for variables with which to make predictions that are then plotted. This is especially useful if the data that was used to fit the model shows some larger gaps as effect plots are linearly interpolated between observations. For an example using newdata see below.
type: character string giving the type of plot desired. Per default, points and lines are plotted ("b"). Other useful options are points ("p") or lines ("l"). See plot.default for details.
rug: logical. Should a rug be added to the x-axis?
rugcol: color for the rug.
ylim: the y limits of the plot.
xlab: a label for the x axis.
ylab: a label for the y axis.
add: logical. Should the plot be added to the previous plot?
...: Additional arguments to the plot functions. E.g. one can specify the x limits xlim or the color of the plot using col.

Details

The coefficient paths for glmboost models show how the coefficient estimates evolve with increasing mstop. Each line represents one parameter estimate. Parameter estimates are only depicted when they they are selected at any time in the boosting model. Parameters that are not selected are droped from the figure (see example).

Models specified with gamboost or mboost are plotted as partial effects. Only the effect of the current bossting iteration is depicted instead of the coefficient paths as for linear models. The function lines is just a wrapper to plot(... , add = TRUE) where per default the effect is plotted as line and the rug is set to FALSE.

Spatial effects can be also plotted using the function plot for mboost models (using lattice graphics). More complex effects reuquire manual plotting: One needs to predict the effects on a disired grid and plot the effect estimates.

References

Benjamin Hofner, Andreas Mayr, Nikolay Robinzonov and Matthias Schmid (2014). Model-based Boosting in R: A Hands-on Tutorial Using the R Package mboost. Computational Statistics, 29, 3--35.
tools:::Rd_expr_doi("10.1007/s00180-012-0382-5")

Examples

Run this code


### a simple example: cars data with one random variable
set.seed(1234)
cars$z <- rnorm(50)

########################################
## Plot linear models
########################################

## fit a linear model
cars.lm <- glmboost(dist ~ speed + z, data = cars)

## plot coefficient paths of glmboost
par(mfrow = c(3, 1), mar = c(4, 4, 4, 8))
plot(cars.lm,
     main = "Coefficient paths (offset not included)")
plot(cars.lm, off2int = TRUE,
     main = "Coefficient paths (offset included in intercept)")

## plot coefficient paths only for the first 15 steps,
## i.e., bevore z is selected
mstop(cars.lm) <- 15
plot(cars.lm, off2int = TRUE, main = "z is not yet selected")


########################################
## Plot additive models; basics
########################################

## fit an additive model
cars.gam <- gamboost(dist ~ speed + z, data = cars)

## plot effects
par(mfrow = c(1, 2), mar = c(4, 4, 0.1, 0.1))
plot(cars.gam)

## use same y-lims
plot(cars.gam, ylim = c(-50, 50))

## plot only the effect of speed
plot(cars.gam, which = "speed")
## as partial matching is used we could also use
plot(cars.gam, which = "sp")


########################################
## More complex plots
########################################

## Let us use more boosting iterations and compare the effects.

## We change the plot type and plot both effects in one figure:
par(mfrow = c(1, 1), mar = c(4, 4, 4, 0.1))
mstop(cars.gam) <- 100
plot(cars.gam, which = 1, col = "red", type = "l", rug = FALSE,
     main = "Compare effect for various models")

## Now the same model with 1000 iterations
mstop(cars.gam) <- 1000
lines(cars.gam, which = 1, col = "grey", lty = "dotted")

## There are some gaps in the data. Use newdata to get a smoother curve:
newdata <- data.frame(speed = seq(min(cars$speed), max(cars$speed),
                                  length = 200))
lines(cars.gam, which = 1, col = "grey", lty = "dashed",
      newdata = newdata)

## The model with 1000 steps seems to overfit the data.
## Usually one should use e.g. cross-validation to tune the model.

## Finally we refit the model using linear effects as comparison
cars.glm <- gamboost(dist ~ speed + z, baselearner = bols, data = cars)
lines(cars.glm, which = 1, col = "black")
## We see that all effects are more or less linear.

## Add a legend
legend("topleft", title = "Model",
       legend = c("... with mstop = 100", "... with mstop = 1000",
         "... with linear effects"),
       lty = c("solid", "dashed", "solid"),
       col = c("red", "grey", "black"))

Run the code above in your browser using DataLab