The arguments of cao
are a mixture of those from
vgam
and cqo
, but with some extras
in cao.control
. Currently, not all of the
arguments work properly.
CAO can be loosely be thought of as the result of fitting generalized
additive models (GAMs) to several responses (e.g., species) against
a very small number of latent variables. Each latent variable is a
linear combination of the explanatory variables; the coefficients
C (called $C$ below) are called constrained
coefficients or canonical coefficients, and are interpreted as
weights or loadings. The C are estimated by maximum likelihood
estimation. It is often a good idea to apply scale
to each explanatory variable first.
For each response (e.g., species), each latent variable is smoothed
by a cubic smoothing spline, thus CAO is data-driven. If each smooth
were a quadratic then CAO would simplify to constrained quadratic
ordination (CQO; formerly called canonical Gaussian ordination
or CGO).
If each smooth were linear then CAO would simplify to constrained
linear ordination (CLO). CLO can theoretically be fitted with
cao
by specifying df1.nl=0
, however it is more efficient
to use rrvglm
.
Currently, only Rank=1
is implemented, and only
Norrr = ~1
models are handled.
With binomial data, the default formula is
$$logit(P[Y_s=1]) = \eta_s = f_s(\nu), \ \ \ s=1,2,\ldots,S$$
where $x_2$ is a vector of environmental variables, and
$\nu=C^T x_2$ is a $R$-vector of latent variables.
The $\eta_s$ is an additive predictor for species $s$,
and it models the probabilities of presence as an additive model on
the logit scale. The matrix $C$ is estimated from the data, as
well as the smooth functions $f_s$. The argument Norrr = ~
1
specifies that the vector $x_1$, defined for RR-VGLMs
and QRR-VGLMs, is simply a 1 for an intercept.
Here, the intercept in the model is absorbed into the functions.
A cloglog
link may be preferable over a
logit
link.
With Poisson count data, the formula is
$$\log(E[Y_s]) = \eta_s = f_s(\nu)$$
which models the mean response as an additive models on the log scale.
The fitted latent variables (site scores) are scaled to have
unit variance. The concept of a tolerance is undefined for
CAO models, but the optima and maxima are defined. The generic
functions Max
and Opt
should work for
CAO objects, but note that if the maximum occurs at the boundary then
Max
will return a NA
. Inference for CAO models
is currently undeveloped.