
A constrained quadratic ordination (CQO; formerly called canonical Gaussian ordination or CGO) model is fitted using the quadratic reduced-rank vector generalized linear model (QRR-VGLM) framework.
cqo(formula, family = stop("argument 'family' needs to be assigned"),
data = list(), weights = NULL, subset = NULL,
na.action = na.fail, etastart = NULL, mustart = NULL,
coefstart = NULL, control = qrrvglm.control(...), offset = NULL,
method = "cqo.fit", model = FALSE, x.arg = TRUE, y.arg = TRUE,
contrasts = NULL, constraints = NULL, extra = NULL,
smart = TRUE, ...)
a symbolic description of the model to be fit. The RHS of the formula is applied to each linear predictor. Different variables in each linear predictor can be chosen by specifying constraint matrices.
a function of class "vglmff"
(see vglmff-class
)
describing what statistical model is to be fitted. This is called a
``VGAM family function''. See CommonVGAMffArguments
for general information about many types of arguments found in this
type of function.
Currently the following families are supported:
poissonff
,
binomialff
(logitlink
and clogloglink
links available),
negbinomial
,
gamma2
.
Sometimes special arguments are required for cqo()
, e.g.,
binomialff(multiple.responses = TRUE)
.
an optional data frame containing the variables in the model.
By default the variables are taken from environment(formula)
,
typically the environment from which cqo
is called.
an optional vector or matrix of (prior) weights to be used in the fitting process. Currently, this argument should not be used.
an optional logical vector specifying a subset of observations to be used in the fitting process.
a function which indicates what should happen when the data contain
NA
s. The default is set by the na.action
setting of
options
, and is na.fail
if that is unset.
The ``factory-fresh'' default is na.omit
.
starting values for the linear predictors.
It is a
starting values for the fitted values. It can be a vector or a matrix. Some family functions do not make use of this argument. Currently, this argument probably should not be used.
starting values for the coefficient vector. Currently, this argument probably should not be used.
a list of parameters for controlling the fitting process.
See qrrvglm.control
for details.
This argument must not be used.
the method to be used in fitting the model.
The default (and presently only) method cqo.fit
uses iteratively reweighted least squares (IRLS).
a logical value indicating whether the model frame
should be assigned in the model
slot.
logical values indicating whether
the model matrix and response matrix used in the fitting
process should be assigned in the x
and y
slots.
Note the model matrix is the LM model matrix.
an optional list. See the contrasts.arg
of model.matrix.default
.
an optional list of constraint matrices.
The components of the list must be named with the term it corresponds
to (and it must match in character format).
Each constraint matrix must have constraints
is used it must contain all the
terms; an incomplete list is not accepted.
Constraint matrices for
an optional list with any extra information that might be needed by the family function.
logical value indicating whether smart prediction
(smartpred
) will be used.
further arguments passed into qrrvglm.control
.
An object of class "qrrvglm"
.
Local solutions are not uncommon when fitting CQO models. To increase
the chances of obtaining the global solution, increase the value
of the argument Bestof
in qrrvglm.control
.
For reproducibility of the results, it pays to set a different
random number seed before calling cqo
(the function
set.seed
does this). The function cqo
chooses initial values for C using .Init.Poisson.QO()
if Use.Init.Poisson.QO = TRUE
, else random numbers.
Unless I.tolerances = TRUE
or eq.tolerances = FALSE
,
CQO is computationally expensive with memory and time.
It pays to keep the rank down to 1
or 2. If eq.tolerances = TRUE
and I.tolerances = FALSE
then
the cost grows quickly with the number of species and sites (in terms of
memory requirements and time). The data needs to conform quite closely
to the statistical model, and the environmental range of the data should
be wide in order for the quadratics to fit the data well (bell-shaped
response surfaces). If not, RR-VGLMs will be more appropriate because
the response is linear on the transformed scale (e.g., log or logit)
and the ordination is called constrained linear ordination or CLO.
Like many regression models, CQO is sensitive to outliers (in the environmental and species data), sparse data, high leverage points, multicollinearity etc. For these reasons, it is necessary to examine the data carefully for these features and take corrective action (e.g., omitting certain species, sites, environmental variables from the analysis, transforming certain environmental variables, etc.). Any optimum lying outside the convex hull of the site scores should not be trusted. Fitting a CAO is recommended first, then upon transformations etc., possibly a CQO can be fitted.
For binary data, it is necessary to have `enough' data. In general,
the number of sites Rank = 1
and if the response data for each species is a string of all absences,
then all presences, then all absences (when enumerated along the latent
variable) then infinite parameter estimates will occur. In general,
setting I.tolerances = TRUE
may help.
This function was formerly called cgo
. It has been renamed to
reinforce a new nomenclature described in Yee (2006).
QRR-VGLMs or constrained quadratic ordination (CQO) models
are estimated here by maximum likelihood estimation. Optimal linear
combinations of the environmental variables are computed, called
latent variables (these appear as latvar
for latvar1
, latvar2
, etc. in the output). Here,
The central formula (for Poisson and binomial species data) is
given by
qrrvglm.control
, e.g., the argument noRRR
specifies which variables comprise
Theoretically, the four most popular VGAM family functions
to be used with cqo
correspond to the Poisson, binomial,
normal, and negative binomial distributions. The latter is a
2-parameter model. All of these are implemented, as well as the
2-parameter gamma.
For initial values, the function .Init.Poisson.QO
should
work reasonably well if the data is Poisson with species having equal
tolerances. It can be quite good on binary data too. Otherwise the
Cinit
argument in qrrvglm.control
can be used.
It is possible to relax the quadratic form to an additive model. The
result is a data-driven approach rather than a model-driven approach,
so that CQO is extended to constrained additive ordination
(CAO) when cao
for more details.
In this documentation,
Incidentally,
Unconstrained quadratic ordination (UQO)
may be performed by, e.g., fitting a Goodman's RC association model;
see uqo
and the Yee and Hadi (2014) referenced there.
For UQO, the response is the usual site-by-species matrix and
there are no environmental variables;
the site scores are free parameters.
UQO can be performed under the assumption that all species
have the same tolerance matrices.
Yee, T. W. (2004) A new technique for maximum-likelihood canonical Gaussian ordination. Ecological Monographs, 74, 685--701.
ter Braak, C. J. F. and Prentice, I. C. (1988) A theory of gradient analysis. Advances in Ecological Research, 18, 271--317.
Yee, T. W. (2006) Constrained additive ordination. Ecology, 87, 203--213.
qrrvglm.control
,
Coef.qrrvglm
,
predictqrrvglm
,
calibrate.qrrvglm
,
model.matrixqrrvglm
,
vcovqrrvglm
,
rcqo
,
cao
,
uqo
,
rrvglm
,
poissonff
,
binomialff
,
negbinomial
,
gamma2
,
lvplot.qrrvglm
,
perspqrrvglm
,
trplot.qrrvglm
,
vglm
,
set.seed
,
hspider
,
trapO
.
# NOT RUN {
# Example 1; Fit an unequal tolerances model to the hunting spiders data
hspider[,1:6] <- scale(hspider[,1:6]) # Standardized environmental variables
set.seed(1234) # For reproducibility of the results
p1ut <- cqo(cbind(Alopacce, Alopcune, Alopfabr, Arctlute, Arctperi,
Auloalbi, Pardlugu, Pardmont, Pardnigr, Pardpull,
Trocterr, Zoraspin) ~
WaterCon + BareSand + FallTwig + CoveMoss + CoveHerb + ReflLux,
fam = poissonff, data = hspider, Crow1positive = FALSE,
eq.tol = FALSE)
sort(deviance(p1ut, history = TRUE)) # A history of all the iterations
if (deviance(p1ut) > 1177) warning("suboptimal fit obtained")
S <- ncol(depvar(p1ut)) # Number of species
clr <- (1:(S+1))[-7] # Omits yellow
lvplot(p1ut, y = TRUE, lcol = clr, pch = 1:S, pcol = clr,
las = 1) # Ordination diagram
legend("topright", leg = colnames(depvar(p1ut)), col = clr,
pch = 1:S, merge = TRUE, bty = "n", lty = 1:S, lwd = 2)
(cp <- Coef(p1ut))
(a <- latvar(cp)[cp@latvar.order]) # Ordered site scores along the gradient
# Names of the ordered sites along the gradient:
rownames(latvar(cp))[cp@latvar.order]
(aa <- Opt(cp)[, cp@Optimum.order]) # Ordered optimums along the gradient
aa <- aa[!is.na(aa)] # Delete the species that is not unimodal
names(aa) # Names of the ordered optimums along the gradient
trplot(p1ut, which.species = 1:3, log = "xy", type = "b", lty = 1, lwd = 2,
col = c("blue","red","green"), label = TRUE) -> ii # Trajectory plot
legend(0.00005, 0.3, paste(ii$species[, 1], ii$species[, 2], sep = " and "),
lwd = 2, lty = 1, col = c("blue", "red", "green"))
abline(a = 0, b = 1, lty = "dashed")
S <- ncol(depvar(p1ut)) # Number of species
clr <- (1:(S+1))[-7] # Omits yellow
persp(p1ut, col = clr, label = TRUE, las = 1) # Perspective plot
# Example 2; Fit an equal tolerances model. Less numerically fraught.
set.seed(1234)
p1et <- cqo(cbind(Alopacce, Alopcune, Alopfabr, Arctlute, Arctperi,
Auloalbi, Pardlugu, Pardmont, Pardnigr, Pardpull,
Trocterr, Zoraspin) ~
WaterCon + BareSand + FallTwig + CoveMoss + CoveHerb + ReflLux,
poissonff, data = hspider, Crow1positive = FALSE)
sort(deviance(p1et, history = TRUE)) # A history of all the iterations
if (deviance(p1et) > 1586) warning("suboptimal fit obtained")
S <- ncol(depvar(p1et)) # Number of species
clr <- (1:(S+1))[-7] # Omits yellow
persp(p1et, col = clr, label = TRUE, las = 1)
# Example 3: A rank-2 equal tolerances CQO model with Poisson data
# This example is numerically fraught... need I.toler = TRUE too.
set.seed(555)
p2 <- cqo(cbind(Alopacce, Alopcune, Alopfabr, Arctlute, Arctperi,
Auloalbi, Pardlugu, Pardmont, Pardnigr, Pardpull,
Trocterr, Zoraspin) ~
WaterCon + BareSand + FallTwig + CoveMoss + CoveHerb + ReflLux,
poissonff, data = hspider, Crow1positive = FALSE,
I.toler = TRUE, Rank = 2, Bestof = 3, isd.latvar = c(2.1, 0.9))
sort(deviance(p2, history = TRUE)) # A history of all the iterations
if (deviance(p2) > 1127) warning("suboptimal fit obtained")
lvplot(p2, ellips = FALSE, label = TRUE, xlim = c(-3,4),
C = TRUE, Ccol = "brown", sites = TRUE, scol = "grey",
pcol = "blue", pch = "+", chull = TRUE, ccol = "grey")
# Example 4: species packing model with presence/absence data
set.seed(2345)
n <- 200; p <- 5; S <- 5
mydata <- rcqo(n, p, S, fam = "binomial", hi.abundance = 4,
eq.tol = TRUE, es.opt = TRUE, eq.max = TRUE)
myform <- attr(mydata, "formula")
set.seed(1234)
b1et <- cqo(myform, binomialff(multiple.responses = TRUE, link = "clogloglink"),
data = mydata)
sort(deviance(b1et, history = TRUE)) # A history of all the iterations
lvplot(b1et, y = TRUE, lcol = 1:S, pch = 1:S, pcol = 1:S, las = 1)
Coef(b1et)
# Compare the fitted model with the 'truth'
cbind(truth = attr(mydata, "concoefficients"), fitted = concoef(b1et))
# Example 5: Plot the deviance residuals for diagnostic purposes
set.seed(1234)
p1et <- cqo(cbind(Alopacce, Alopcune, Alopfabr, Arctlute, Arctperi,
Auloalbi, Pardlugu, Pardmont, Pardnigr, Pardpull,
Trocterr, Zoraspin) ~
WaterCon + BareSand + FallTwig + CoveMoss + CoveHerb + ReflLux,
poissonff, data = hspider, eq.tol = TRUE, trace = FALSE)
sort(deviance(p1et, history = TRUE)) # A history of all the iterations
if (deviance(p1et) > 1586) warning("suboptimal fit obtained")
S <- ncol(depvar(p1et))
par(mfrow = c(3, 4))
for (ii in 1:S) {
tempdata <- data.frame(latvar1 = c(latvar(p1et)),
sppCounts = depvar(p1et)[, ii])
tempdata <- transform(tempdata, myOffset = -0.5 * latvar1^2)
# For species ii, refit the model to get the deviance residuals
fit1 <- vglm(sppCounts ~ offset(myOffset) + latvar1, poissonff,
data = tempdata, trace = FALSE)
# For checking: this should be 0
# print("max(abs(c(Coef(p1et)@B1[1,ii],Coef(p1et)@A[ii,1])-coef(fit1)))")
# print( max(abs(c(Coef(p1et)@B1[1,ii],Coef(p1et)@A[ii,1])-coef(fit1))) )
# Plot the deviance residuals
devresid <- resid(fit1, type = "deviance")
predvalues <- predict(fit1) + fit1@offset
ooo <- with(tempdata, order(latvar1))
plot(predvalues + devresid ~ latvar1, data = tempdata, col = "red",
xlab = "latvar1", ylab = "", main = colnames(depvar(p1et))[ii])
with(tempdata, lines(latvar1[ooo], predvalues[ooo], col = "blue"))
}
# }
Run the code above in your browser using DataLab