cca
performs correspondence analysis, or optionally
constrained correspondence analysis (a.k.a. canonical correspondence
analysis), or optionally partial constrained correspondence
analysis. Function rda
performs redundancy analysis, or
optionally principal components analysis.
These are all very popular ordination techniques in community ecology.## S3 method for class 'formula':
cca(formula, data)
## S3 method for class 'default':
cca(X, Y, Z, ...)
## S3 method for class 'formula':
rda(formula, data, scale=FALSE)
## S3 method for class 'default':
rda(X, Y, Z, scale=FALSE, ...)
## S3 method for class 'cca':
summary(object, scaling=2, axes=6, digits, ...)
Condition
.cca
result object.2
) or site (1
) scores are scaled by eigenvalues, and
the other set of scores is left unscaled, or with 3
both are
scaled symmetrically by square root oprint
or plot
functions.cca
returns a big object of class cca
. It has
as elements
separate lists for pCCA, CCA and CA. These lists have information on
total Chi-square and estimated rank of the stage. Lists CCA
and CA
contain scores for species (v
) and sites
(u
). These site scores are linear constraints in CCA
and
weighted averages in CA
. In addition, list CCA
has
item wa
for site scores and biplot
for endpoints of
biplot arrows. All these scores are unscaled (actually, their
weighted sum of squares is one). The result object can be
accessed with functions summary
and scores.cca
which
know how to scale the results for display. The traditional
alternative in correspondence analysis (see decorana
)
was to scale sites by
eigenvalues and leave species unscaled (scaling=1
),
so that configuration of sites
would reflect the real structure in data (longer axes for higher
eigenvalues). Species scores would not reflect axis lengths, and they
would have larger variation than species scores, which was motivated
by some species having their optima outside studied range. Later the
common practice was to leave sites unscaled (scaling=2
),
so that they would have a better relation with biplot arrows. Function rda
returns an object of class rda
which
inherits from class cca
. Function rda
is really only a
spin-off from cca
, and the object uses the same item names as
cca
, which are misleading in this case. The only specific
function is summary.rda
, but otherwise the object rda
is
accessed with cca
methods (print
, plot.cca
,
scores.cca
, anova.cca
). The analysis
stores unscaled
results names similarly as in cca
, but summary.rda
(and
hence plot
and scores
functions) scales these so that
site and species scores are similarly scaled to each other as in
Canoco
. However, the summary.rda
scores differ from
Canoco
by a constant multiplier so that they define a real
biplot or an approximation of the data.
cca
and rda
are similar to popular
proprietary software Canoco
, although implementation is
completely different. The functions are based on Legendre &
Legendre's (1998) algorithm: in cca
Chi-square transformed data matrix is subjected to weighted linear
regression on constraining variables, and the fitted values are
submitted to correspondence analysis performed via singular value
decomposition (svd
). Function rda
is similar, but uses
ordinary, unweighted linear regression and unweighted SVD.The functions can be called either with matrix entries for community data and constraints, or with formula interface. In general, the formula interface is preferred, because it allows a better control of the model and allows factor constraints.
In matrix interface, the
community data matrix X
must be given, but any other data
matrix can be omitted, and the corresponding stage of analysis is
skipped. If matrix Z
is supplied, its effects are removed from
the community matrix, and the residual matrix is submitted to the next
stage. This is called `partial' correspondence or redundancy
analysis. If matrix
Y
is supplied, it is used to constrain the ordination,
resulting in constrained or canonical correspondence analysis, or
redundancy analysis.
Finally, the residual is submitted to ordinary correspondence
analysis (or principal components analysis). If both matrices
Z
and Y
are missing, the
data matrix is analysed by ordinary correspondence analysis (or
principal components analysis).
Instead of separate matrices, the model can be defined using a model
formula
. The left hand side must be the
community data matrix (X
). The right hand side defines the
constraining model.
The constraints can contain ordered or unordered factors,
interactions among variables and functions of variables. The defined
contrasts
are honoured in factor
variables. The formula can include a special term Condition
for conditioning variables (``covariables'') ``partialled out'' before
analysis. So the following commands are equivalent: cca(X, y,
z)
, cca(X ~ y + Condition(z))
, where y
and z
refer to single variable constraints and conditions.
Constrained correspondence analysis is indeed a constrained method:
CCA does not try to display all variation in the
data, but only the part that can be explained by the used constraints.
Consequently, the results are strongly dependent on the set of
constraints and their transformations or interactions among the
constraints. The shotgun method is to use all environmental variables
as constraints. However, such exploratory problems are better
analysed with
unconstrained methods such as correspondence analysis
(decorana
, ca
) or non-metric
multidimensional scaling (isoMDS
) and
environmental interpretation after analysis
(envfit
, ordisurf
).
CCA is a good choice if the user has
clear and strong a priori hypotheses on constraints and is not
interested in the major structure in the data set.
CCA is able to correct a common curve artefact in correspondence analysis by forcing the configuration into linear constraints. However, the curve artefact can be avoided only with a low number of constraints that do not have a curvilinear relation with each other. The curve can reappear even with two badly chosen constraints or a single factor. Although the formula interface makes easy to include polynomial or interaction terms, such terms often allow curve artefact (and are difficult to interpret), and should probably be avoided.
According to folklore, rda
should be used with ``short
gradients'' rather than cca
. However, this is not based
on research which finds methods based on Euclidean metric as uniformly
weaker than those based on Chi-squared metric.
Partial CCA (pCCA; or alternatively partial RDA) can be used to remove
the effect of some
conditioning or ``background'' or ``random'' variables or
``covariables'' before CCA proper. In fact, pCCA compares models
cca(X ~ z)
and cca(X ~ y + z)
and attributes their
difference to the effect of y
cleansed of the effect of
z
. Some people have used the method for extracting
``components of variance'' in CCA. However, if the effect of
variables together is stronger than sum of both separately, this can
increase total Chi-square after ``partialling out'' some
variation, and give negative ``components of variance''. In general,
such components of ``variance'' are not to be trusted due to
interactions between two sets of variables.
The functions have summary
and plot
methods. The
summary
method lists all species and site scores, and results
may be very long. Palmer (1993) suggested using linear constraints
(``LC scores'') in ordination diagrams, because these gave better
results in simulations and site scores (``WA scores'') are a step from
constrained to unconstrained analysis. However, McCune (1997) showed
that noisy environmental variables (and all environmental
measurements are noisy) destroy ``LC scores'' whereas ``WA scores''
were little affected. Therefore the plot
function uses site
scores (``WA scores'') as the default. This is consistent with the
usage in statistics and other functions in R(lda
, cancor
).
Legendre, P. and Legendre, L. (1998) Numerical Ecology. 2nd English ed. Elsevier.
McCune, B. (1997) Influence of noisy environmental data on canonical correspondence analysis. Ecology 78, 2617-2623. Palmer, M. W. (1993) Putting things in even better order: The advantages of canonical correspondence analysis. Ecology 74, 2215-2230. Ter Braak, C. J. F. (1986) Canonical Correspondence Analysis: a new eigenvector technique for multivariate direct gradient analysis. Ecology 67, 1167-1179.
plot.cca
function
with its helper functions (text.cca
,
points.cca
, scores.cca
).
Function anova.cca
provides an ANOVA like permutation
test for the ``significance'' of constraints.
Functions CAIV
(library CoCoAn
) and
cca
(library ade4
) provide an alternative
implementations of CCA (these are internally quite
different). Function capscale
is a non-Euclidean generalization of
rda
.data(varespec)
data(varechem)
## Common but bad way: use all variables you happen to have in your
## environmental data matrix
vare.cca <- cca(varespec, varechem)
vare.cca
plot(vare.cca)
## Formula interface and a better model
vare.cca <- cca(varespec ~ Al + P*(K + Baresoil), data=varechem)
vare.cca
plot(vare.cca)
## `Partialling out' and `negative components of variance'
cca(varespec ~ Ca, varechem)
cca(varespec ~ Ca + Condition(pH), varechem)
## RDA
data(dune)
data(dune.env)
dune.Manure <- rda(dune ~ Manure, dune.env)
plot(dune.Manure)
Run the code above in your browser using DataLab