cca: [Partial] [Constrained] Correspondence Analysis and Redundancy Analysis

Description

Function cca performs correspondence analysis, or optionally constrained correspondence analysis (a.k.a. canonical correspondence analysis), or optionally partial constrained correspondence analysis. Function rda performs redundancy analysis, or optionally principal components analysis. These are all very popular ordination techniques in community ecology.

Usage

## S3 method for class 'formula':
cca(formula, data)
## S3 method for class 'default':
cca(X, Y, Z, ...)
## S3 method for class 'formula':
rda(formula, data, scale=FALSE)
## S3 method for class 'default':
rda(X, Y, Z, scale=FALSE, ...)
## S3 method for class 'cca':
summary(object, scaling=2, axes=6, digits, ...)

Arguments

formula

Model formula, where the left hand side gives the community data matrix, right hand side gives the constraining variables, and conditioning variables can be given within a special function Condition.

data

Data frame containing the variables on the right hand side of the model formula.

Community data matrix.

Constraining matrix, typically of environmental variables. Can be missing.

Conditioning matrix, the effect of which is removed (`partialled out') before next step. Can be missing.

object

A cca result object.

scaling

Scaling for species and site scores. Either species (2) or site (1) scores are scaled by eigenvalues, and the other set of scores is left unscaled, or with 3 both are scaled symmetrically by square root o

axes

Number of axes in summaries.

digits

Number of digits in output.

scale

Scale species to unit variance (like correlations do).

...

Other parameters for print or plot functions.

Value

Function cca returns a big object of class cca. It has as elements separate lists for pCCA, CCA and CA. These lists have information on total Chi-square and estimated rank of the stage. Lists CCA and CA contain scores for species (v) and sites (u). These site scores are linear constraints in CCA and weighted averages in CA. In addition, list CCA has item wa for site scores and biplot for endpoints of biplot arrows. All these scores are unscaled (actually, their weighted sum of squares is one). The result object can be accessed with functions summary and scores.cca which know how to scale the results for display. The traditional alternative in correspondence analysis (see decorana) was to scale sites by eigenvalues and leave species unscaled (scaling=1), so that configuration of sites would reflect the real structure in data (longer axes for higher eigenvalues). Species scores would not reflect axis lengths, and they would have larger variation than species scores, which was motivated by some species having their optima outside studied range. Later the common practice was to leave sites unscaled (scaling=2), so that they would have a better relation with biplot arrows.
Function rda returns an object of class rda which inherits from class cca. Function rda is really only a spin-off from cca, and the object uses the same item names as cca, which are misleading in this case. The only specific function is summary.rda, but otherwise the object rda is accessed with cca methods (print, plot.cca, scores.cca, anova.cca). The analysis stores unscaled results names similarly as in cca, but summary.rda (and hence plot and scores functions) scales these so that site and species scores are similarly scaled to each other as in Canoco. However, the summary.rda scores differ from Canoco by a constant multiplier so that they define a real biplot or an approximation of the data.

Details

Since their introduction (ter Braak 1986), constrained or canonical correspondence analysis, and its spin-off, redundancy analysis have been the most popular ordination methods in community ecology. Functions cca and rda are similar to popular proprietary software Canoco, although implementation is completely different. The functions are based on Legendre & Legendre's (1998) algorithm: in cca Chi-square transformed data matrix is subjected to weighted linear regression on constraining variables, and the fitted values are submitted to correspondence analysis performed via singular value decomposition (svd). Function rda is similar, but uses ordinary, unweighted linear regression and unweighted SVD.

The functions can be called either with matrix entries for community data and constraints, or with formula interface. In general, the formula interface is preferred, because it allows a better control of the model and allows factor constraints.

In matrix interface, the community data matrix X must be given, but any other data matrix can be omitted, and the corresponding stage of analysis is skipped. If matrix Z is supplied, its effects are removed from the community matrix, and the residual matrix is submitted to the next stage. This is called `partial' correspondence or redundancy analysis. If matrix Y is supplied, it is used to constrain the ordination, resulting in constrained or canonical correspondence analysis, or redundancy analysis. Finally, the residual is submitted to ordinary correspondence analysis (or principal components analysis). If both matrices Z and Y are missing, the data matrix is analysed by ordinary correspondence analysis (or principal components analysis).

Instead of separate matrices, the model can be defined using a model formula. The left hand side must be the community data matrix (X). The right hand side defines the constraining model. The constraints can contain ordered or unordered factors, interactions among variables and functions of variables. The defined contrasts are honoured in factor variables. The formula can include a special term Condition for conditioning variables (``covariables'') ``partialled out'' before analysis. So the following commands are equivalent: cca(X, y, z), cca(X ~ y + Condition(z)), where y and z refer to single variable constraints and conditions.

Constrained correspondence analysis is indeed a constrained method: CCA does not try to display all variation in the data, but only the part that can be explained by the used constraints. Consequently, the results are strongly dependent on the set of constraints and their transformations or interactions among the constraints. The shotgun method is to use all environmental variables as constraints. However, such exploratory problems are better analysed with unconstrained methods such as correspondence analysis (decorana, ca) or non-metric multidimensional scaling (isoMDS) and environmental interpretation after analysis (envfit, ordisurf). CCA is a good choice if the user has clear and strong a priori hypotheses on constraints and is not interested in the major structure in the data set.

CCA is able to correct a common curve artefact in correspondence analysis by forcing the configuration into linear constraints. However, the curve artefact can be avoided only with a low number of constraints that do not have a curvilinear relation with each other. The curve can reappear even with two badly chosen constraints or a single factor. Although the formula interface makes easy to include polynomial or interaction terms, such terms often allow curve artefact (and are difficult to interpret), and should probably be avoided.

According to folklore, rda should be used with ``short gradients'' rather than cca. However, this is not based on research which finds methods based on Euclidean metric as uniformly weaker than those based on Chi-squared metric. Partial CCA (pCCA; or alternatively partial RDA) can be used to remove the effect of some conditioning or ``background'' or ``random'' variables or ``covariables'' before CCA proper. In fact, pCCA compares models cca(X ~ z) and cca(X ~ y + z) and attributes their difference to the effect of y cleansed of the effect of z. Some people have used the method for extracting ``components of variance'' in CCA. However, if the effect of variables together is stronger than sum of both separately, this can increase total Chi-square after ``partialling out'' some variation, and give negative ``components of variance''. In general, such components of ``variance'' are not to be trusted due to interactions between two sets of variables. The functions have summary and plot methods. The summary method lists all species and site scores, and results may be very long. Palmer (1993) suggested using linear constraints (``LC scores'') in ordination diagrams, because these gave better results in simulations and site scores (``WA scores'') are a step from constrained to unconstrained analysis. However, McCune (1997) showed that noisy environmental variables (and all environmental measurements are noisy) destroy ``LC scores'' whereas ``WA scores'' were little affected. Therefore the plot function uses site scores (``WA scores'') as the default. This is consistent with the usage in statistics and other functions in R(lda, cancor).

References

The original method was by ter Braak, but the current implementations follows Legendre and Legendre.

Legendre, P. and Legendre, L. (1998) Numerical Ecology. 2nd English ed. Elsevier.

McCune, B. (1997) Influence of noisy environmental data on canonical correspondence analysis. Ecology 78, 2617-2623. Palmer, M. W. (1993) Putting things in even better order: The advantages of canonical correspondence analysis. Ecology 74, 2215-2230. Ter Braak, C. J. F. (1986) Canonical Correspondence Analysis: a new eigenvector technique for multivariate direct gradient analysis. Ecology 67, 1167-1179.

Examples

Run this code

data(varespec)
data(varechem)
## Common but bad way: use all variables you happen to have in your
## environmental data matrix
vare.cca <- cca(varespec, varechem)
vare.cca
plot(vare.cca)
## Formula interface and a better model
vare.cca <- cca(varespec ~ Al + P*(K + Baresoil), data=varechem)
vare.cca
plot(vare.cca)
## `Partialling out' and `negative components of variance'
cca(varespec ~ Ca, varechem)
cca(varespec ~ Ca + Condition(pH), varechem)
## RDA
data(dune)
data(dune.env)
dune.Manure <- rda(dune ~ Manure, dune.env)
plot(dune.Manure)

Run the code above in your browser using DataLab