Learn R Programming

mvtboost (version 0.5.0)

mvtb.nonlin: Detect departures from linearity from a multivariate tree boosting model.

Description

Detect departures from linearity from a multivariate tree boosting model.

Usage

mvtb.nonlin(object, Y, X, n.trees = NULL, detect = "grid", scale = TRUE)

Arguments

object
object of class mvtb
Y
matrix of predictors
X
matrix of responses
n.trees
number of trees. Defaults to the minimum number of trees given that minimize CV, test, training error.
detect
method for testing possible non-linear effects or interactions. Possible values are "grid", "influence", and "lm". See details.
scale
For method "influence", whether the resulting influences are scaled to sum to 100.

Value

For each outcome, a list is produced showing the interactions in two forms. The first is $rank.list, which shows the nonlinear effect for each pair of predictors ranked according to the size of the departure from non-linearity. The second, $interactions, shows the departure from non-linearity for all pairs of predictors.

Details

This function provides a statistic to detect departures from linearity in the multivariate boosting model for any outcome as a function of pairs of predictors. These departures could be interactions between pairs of variables, or more general non-linear effects. Please note that these methods should be interpreted as exploratory only.

Several methods are provided for detecting departures from non-linearity from pairs of predictors. The "grid" method computes a grid of the model implied predictions as a function of two predictors, averaging over the others. A linear model predicting the observed outcomes from the predicted values is fit, and the mean squared residuals (times 1000) are reported. Large residuals indicate deviations from linearity.

The "influence" method computes the reductions in SSE attributable to predictors after the first split on the tree. These reductions in sums of squared error (or influences) indicate to what extent individual predictors capture deviations from linear, main effects.

The "lm" method is the same as the "grid" method, but produces the grid of predicted values by conditioning on the average values of the other predictors rather than averaging over the values of the other predictors (see Elith et al., 2008) . Like the "grid" approach, large residuals from a linear model (times 1000) indicate departures from linearity.

A final option is to use gbm::interact.gbm from the gbm package to detect interactions. It can be used directly on individual mvtb output models from object$models.

These methods are not necessarily overlapping, and can produce different results. We suggest using several approaches, followed by plotting the model implied effects of the two predictors.

References

Miller P.J., Lubke G.H, McArtor D.B., Bergeman C.S. (Submitted) Finding structure in data: A data mining alternative to multivariate multiple regression. Psychological Methods.

Elith, J., Leathwick, J. R., & Hastie, T. (2008). A working guide to boosted regression trees. Journal of Animal Ecology, 77(4), 802-813.

Friedman, J. H., & Meulman, J. J. (2003). Multiple additive regression trees with application in epidemiology. Statistics in medicine, 22(9), 1365-1381.

See Also

interact.gbm, mvtb.perspec, plot.gbm