cao(formula, family, data = list(),
weights = NULL, subset = NULL, na.action = na.fail,
etastart = NULL, mustart = NULL, coefstart = NULL,
control = cao.control(...), offset = NULL,
method = "cao.fit", model = FALSE, x.arg = TRUE, y.arg = TRUE,
contrasts = NULL, constraints = NULL,
extra = NULL, qr.arg = FALSE, smart = TRUE, ...)
"vglmff"
(see vglmff-class
)
describing what statistical model is to be fitted. This is called a
``
environment(formula)
,
typically the environment from which cao
is called.cao
, this argument currently should
not be used.NA
s. The default is set by the na.action
setting of
options
, and is na.fail
if that iscao
,
this argument currently should not be used.cao
, this argument currently should not be used.cao
, this
argument currently should not be used.cao.control
for details.cao
, this argument currently should not be used.cao.fit
uses iteratively
reweighted least squares (IRLS) within FORTRAN code called from
optim
model
slot.x
and y
slots. Note the model matrix is the linear
model (LM) matrix.contrasts.arg
of
model.matrix.default
.cao
, this
argument currently should not be used. The components of the list
must be named with the term it corresponds to (and it must match
in character format). Each constraint macao
, this argument currently should
not be used.cao
, this argument currently should not be used.smartpred
) will be used.cao.control
.
Use set.seed
just prior to calling
cao()
to make your results reproducible.
The reason for this is finding the optimal
CAO model presents a difficult optimization problem, partly because the
log-likelihood function contains many local solutions. To obtain the
(global) solution the user is advised to try many initial values.
This can be done by setting Bestof
some appropriate value
(see cao.control
). Trying many initial values becomes
progressively more important as the nonlinear degrees of freedom of
the smooths increase.
Currently the dispersion parameter for a
gaussianff
CAO model is estimated slightly differently
and may be slightly biassed downwards (usually a little too small).
cao
are a mixture of those from
vgam
and cqo
, but with some extras
in cao.control
. Currently, not all of the
arguments work properly.
CAO can be loosely be thought of as the result of fitting generalized
additive models (GAMs) to several responses (e.g., species) against
a very small number of latent variables. Each latent variable is a
linear combination of the explanatory variables; the coefficients
C (called $C$ below) are called constrained
coefficients or canonical coefficients, and are interpreted as
weights or loadings. The C are estimated by maximum likelihood
estimation. It is often a good idea to apply scale
to each explanatory variable first.
For each response (e.g., species), each latent variable is smoothed
by a cubic smoothing spline, thus CAO is data-driven. If each smooth
were a quadratic then CAO would simplify to constrained quadratic
ordination (CQO; formerly called canonical Gaussian ordination
or CGO).
If each smooth were linear then CAO would simplify to constrained
linear ordination (CLO). CLO can theoretically be fitted with
cao
by specifying df1.nl=0
, however it is more efficient
to use rrvglm
.
Currently, only Rank=1
is implemented, and only
noRRR = ~1
models are handled.
With binomial data, the default formula is
$$logit(P[Y_s=1]) = \eta_s = f_s(\nu), \ \ \ s=1,2,\ldots,S$$
where $x_2$ is a vector of environmental variables, and
$\nu=C^T x_2$ is a $R$-vector of latent variables.
The $\eta_s$ is an additive predictor for species $s$,
and it models the probabilities of presence as an additive model on
the logit scale. The matrix $C$ is estimated from the data, as
well as the smooth functions $f_s$. The argument noRRR = ~
1
specifies that the vector $x_1$, defined for RR-VGLMs
and QRR-VGLMs, is simply a 1 for an intercept.
Here, the intercept in the model is absorbed into the functions.
A cloglog
link may be preferable over a
logit
link.
With Poisson count data, the formula is $$\log(E[Y_s]) = \eta_s = f_s(\nu)$$ which models the mean response as an additive models on the log scale.
The fitted latent variables (site scores) are scaled to have
unit variance. The concept of a tolerance is undefined for
CAO models, but the optima and maxima are defined. The generic
functions Max
and Opt
should work for
CAO objects, but note that if the maximum occurs at the boundary then
Max
will return a NA
. Inference for CAO models
is currently undeveloped.
Documentation accompanying the
cao.control
,
Coef.cao
,
cqo
,
lv
,
Opt
,
Max
,
lv
,
persp.cao
,
poissonff
,
binomialff
,
negbinomial
,
gamma2
,
gaussianff
,
set.seed
,
gam
.hspider[,1:6] <- scale(hspider[,1:6]) # Standardized environmental vars
set.seed(149) # For reproducible results
ap1 <- cao(cbind(Pardlugu, Pardmont, Pardnigr, Pardpull) ~
WaterCon + BareSand + FallTwig +
CoveMoss + CoveHerb + ReflLux,
family = poissonff, data = hspider, Rank = 1,
df1.nl = c(Pardpull=2.7, 2.5),
Bestof = 7, Crow1positive = FALSE)
sort(ap1@misc$deviance.Bestof) # A history of all the iterations
Coef(ap1)
ccoef(ap1)
par(mfrow = c(2, 2))
plot(ap1) # All the curves are unimodal; some quite symmetric
par(mfrow = c(1, 1), las = 1)
index <- 1:ncol(depvar(ap1))
lvplot(ap1, lcol = index, pcol = index, y = TRUE)
trplot(ap1, label = TRUE, col = index)
abline(a=0, b = 1, lty = 2)
trplot(ap1, label = TRUE, col = "blue", log = "xy", whichSp = c(1,3))
abline(a=0, b = 1, lty = 2)
persp(ap1, col = index, lwd = 2, label = TRUE)
abline(v = Opt(ap1), lty = 2, col = index)
abline(h = Max(ap1), lty = 2, col = index)
Run the code above in your browser using DataLab