influence.cca: Linear Model Diagnostics for Constrained Ordination

Description

This set of function extracts influence statistics and some other linear model statistics directly from a constrained ordination result object from cca, rda, capscale or dbrda. The constraints are linear model functions and these support functions return identical results as the corresponding linear models (lm), and you can use their documentation. The main functions for normal usage are leverage values (hatvalues), standardized residuals (rstandard), studentized or leave-one-out residuals (rstudent), and Cook's distance (cooks.distance). In addition, vcov returns the variance-covariance matrix of coefficients, and its diagonal values the variances of coefficients. Other functions are mainly support functions for these, but they can be used directly.

Usage

# S3 method for cca
hatvalues(model, ...)
# S3 method for cca
rstandard(model, type = c("response", "canoco"), ...)
# S3 method for cca
rstudent(model, type = c("response", "canoco"), ...)
# S3 method for cca
cooks.distance(model, type = c("response", "canoco"), ...)
# S3 method for cca
sigma(object, type = c("response", "canoco"), ...)
# S3 method for cca
vcov(object, type = "canoco", ...)
# S3 method for cca
SSD(object, type = "canoco", ...)
# S3 method for cca
qr(x, ...)
# S3 method for cca
df.residual(object, ...)

Arguments

model, object, x: A constrained ordination result object.
type: Type of statistics used for extracting raw residuals and residual standard deviation (sigma). Either "response" for species data or difference of WA and LC scores for "canoco".
...: Other arguments to functions (ignored).

Author

Jari Oksanen

Details

The vegan algorithm for constrained ordination uses linear model (or weighted linear model in cca) to find the fitted values of dependent community data, and constrained ordination is based on this fitted response (Legendre & Legendre 2012). The hatvalues give the leverage values of these constraints, and the leverage is independent on the response data. Other influence statistics (rstandard, rstudent, cooks.distance) are based on leverage, and on the raw residuals and residual standard deviation (sigma). With type = "response" the raw residuals are given by the unconstrained component of the constrained ordination, and influence statistics are a matrix with dimensions no. of observations times no. of species. For cca the statistics are the same as obtained from the lm model using Chi-square standardized species data (see decostand) as dependent variable, and row sums of community data as weights, and for rda the lm model uses non-modified community data and no weights.

The algorithm in the CANOCO software constraints the results during iteration by performing a linear regression of weighted averages (WA) scores on constraints and taking the fitted values of this regression as linear combination (LC) scores (ter Braak 1984). The WA scores are directly found from species scores, but LC scores are linear combinations of constraints in the regression. With type = "canoco" the raw residuals are the differences of WA and LC scores, and the residual standard deviation (sigma) is taken to be the axis sum of squared WA scores minus one. These quantities have no relationship to residual component of ordination, but they rather are methodological artefacts of an algorithm that is not used in vegan. The result is a matrix with dimensions no. of observations times no. of constrained axes.

Function vcov returns the matrix of variances and covariances of regression coefficients. The diagonal values of this matrix are the variances, and their square roots give the standard errors of regression coefficients. The function is based on SSD that extracts the sum of squares and crossproducts of residuals. The residuals are defined similarly as in influence measures and with each type they have similar properties and limitations, and define the dimensions of the result matrix.

References

Legendre, P. and Legendre, L. (2012) Numerical Ecology. 3rd English ed. Elsevier.

ter Braak, C.J.F. (1984--): CANOCO -- a FORTRAN program for canonical community ordination by [partial] [detrended] [canonical] correspondence analysis, principal components analysis and redundancy analysis. TNO Inst. of Applied Computer Sci., Stat. Dept. Wageningen, The Netherlands.

Examples

Run this code


data(varespec, varechem)
mod <- cca(varespec ~ Al + P + K, varechem)
## leverage
hatvalues(mod)
plot(hatvalues(mod), type = "h")
## ordination plot with leverages: points with high leverage have
## similar LC and WA scores
plot(mod, type = "n")
ordispider(mod)       # segment from LC to WA scores
points(mod, dis="si", cex=5*hatvalues(mod), pch=21, bg=2) # WA scores
text(mod, dis="bp", col=4)

## deviation and influence
head(rstandard(mod))
head(cooks.distance(mod))

## Influence measures from lm
y <- decostand(varespec, "chi.square") # needed in cca
y1 <- with(y, Cladstel)         # take one species for lm
lmod1 <- lm(y1 ~ Al + P + K, varechem, weights = rowSums(varespec))
## numerically identical within 2e-15
range(cooks.distance(lmod1) - cooks.distance(mod)[, "Cladstel"])

## t-values of regression coefficients based on type = "canoco"
## residuals
coef(mod)
coef(mod)/sqrt(diag(vcov(mod, type = "canoco")))

Run the code above in your browser using DataLab