best.r.sq: Use R^2 to find the variables that best explain a multivariate response.

Description

Finds the subset of explanatory variables in a formula that best explain the variation in a multivariate response, as measured by a chosen definition of R^2. Modifications are included for high dimensional data, such as multivariate abundance data in ecology.

Usage

best.r.sq(formula, data = parent.frame(), subset, var.subset,
  n.xvars= min(3, length(xn)), R2="h", ...)

Value

This function returns a list consisting of:

xs: a vector of indices of independent variables with the greatest explanatory power, as previously.
r2Step: a vector of total R^2 from sequential model fits including each of the model terms identified in xs.
r2Matrix: a matrix containing the total R^2 for each term in the model at each addition step (steps in columns and model terms in rows).

Arguments

formula: a mvformula, a multivariate formula.
data: optional, the data.frame (or list) from which the variables in formula should be taken.
subset: an optional vector specifying a subset of observations to be used in the fitting process.
var.subset: an optional vector specifying the subset of the responses to be used.
n.xvars: the number of independent variables with the highest average R^2 that should be found.
R2: the type of R^2 (correlation coefficient) that should be shown, possible values are:
"h" = Hooper's R^2 = tr(SST^(-1)SSR))/p
"v" = vector R^2 = det(SSR)/det(SST)
"n" = none Note that for a univariate response, all of these are equivalent to the ordinary product-moment correlation coefficient.
...: further arguments that are passed on to lm.

Author

Ulrike Naumann and David Warton <David.Warton@unsw.edu.au>.

Details

best.r.sq finds the n.xvars influence variables obtained by a forward selection in a multivariate linear model given by formula.
Only the response variables given by var.subset are considered. However, if var.subset is NULL all response variables are considered.
Interactions are excluded from the search mechanism, however the indices that are returned correspond to the indices in the model. This function is intended as an exploratory tool which can be used for example in plotting, and is not intended as a tool for formal model selection. choose 'all possible subsets' the moment)

Examples

Run this code

data(spider)
spiddat <- mvabund(spider$abund)
X <- as.matrix(spider$x)

best.r.sq( spiddat~X )

Run the code above in your browser using DataLab