cqo: Fitting Constrained Quadratic Ordination (CQO)

Description

A constrained quadratic ordination (CQO; formerly called canonical Gaussian ordination or CGO) model is fitted using the quadratic reduced-rank vector generalized linear model (QRR-VGLM) framework.

Usage

cqo(formula, family, data = list(), weights = NULL, subset = NULL,
    na.action = na.fail, etastart = NULL, mustart = NULL,
    coefstart = NULL, control = qrrvglm.control(...), offset = NULL,
    method = "cqo.fit", model = FALSE, x.arg = TRUE, y.arg = TRUE,
    contrasts = NULL, constraints = NULL, extra = NULL,
    smart = TRUE, ...)

Arguments

Value

An object of class "qrrvglm". Note that the slot misc has a list component called deviance.Bestof which gives the history of deviances over all the iterations.

Warning

Local solutions are not uncommon when fitting CQO models. To increase the chances of obtaining the global solution, increase the value of the argument Bestof in qrrvglm.control. For reproducibility of the results, it pays to set a different random number seed before calling cqo (the function set.seed does this). The function cqo chooses initial values for C using .Init.Poisson.QO() if Use.Init.Poisson.QO=TRUE, else random numbers.

Unless ITolerances=TRUE or EqualTolerances=FALSE, CQO is computationally expensive. It pays to keep the rank down to 1 or 2. If EqualTolerances=TRUE and ITolerances=FALSE then the cost grows quickly with the number of species and sites (in terms of memory requirements and time). The data needs to conform quite closely to the statistical model, and the environmental range of the data should be wide in order for the quadratics to fit the data well (bell-shaped response surfaces). If not, RR-VGLMs will be more appropriate because the response is linear on the transformed scale (e.g., log or logit) and the ordination is called constrained linear ordination or CLO.

Like many regression models, CQO is sensitive to outliers (in the environmental and species data), sparse data, high leverage points, multicollinearity etc. For these reasons, it is necessary to examine the data carefully for these features and take corrective action (e.g., omitting certain species, sites, environmental variables from the analysis, transforming certain environmental variables, etc.). Any optimum lying outside the convex hull of the site scores should not be trusted. Fitting a CAO is recommended first, then upon transformations etc., possibly a CQO can be fitted.

For binary data, it is necessary to have `enough' data. In general, the number of sites $n$ ought to be much larger than the number of species S, e.g., at least 100 sites for two species. Compared to count (Poisson) data, numerical problems occur more frequently with presence/absence (binary) data. For example, if Rank=1 and if the response data for each species is a string of all absences, then all presences, then all absences (when enumerated along the latent variable) then infinite parameter estimates will occur. In general, setting ITolerances=TRUE may help.

This function was formerly called cgo. It has been renamed to reinforce a new nomenclature described in Yee (2006).

Details

QRR-VGLMs or constrained quadratic ordination (CQO) models are estimated here by maximum likelihood estimation. Optimal linear combinations of the environmental variables are computed, called latent variables (these appear as lv for $R=1$ else lv1, lv2, etc. in the output). Here, $R$ is the rank or the number of ordination axes. Each species' response is then a regression of these latent variables using quadratic polynomials on a transformed scale (e.g., log for Poisson counts, logit for presence/absence responses). The solution is obtained iteratively in order to maximize the log-likelihood function, or equivalently, minimize the deviance.

The central formula (for Poisson and binomial species data) is given by $$\eta = B_1^T x_1 + A \nu + \sum_{m=1}^M (\nu^T D_m \nu) e_m$$ where $x_1$ is a vector (usually just a 1 for an intercept), $x_2$ is a vector of environmental variables, $\nu=C^T x_2$ is a $R$-vector of latent variables, $e_m$ is a vector of 0s but with a 1 in the $m$th position. The $\eta$ are a vector of linear/additive predictors, e.g., the $m$th element is $\eta_m = \log(E[Y_m])$ for the $m$th species. The matrices $B_1$, $A$, $C$ and $D_m$ are estimated from the data, i.e., contain the regression coefficients. The tolerance matrices satisfy $T_s = -\frac12 D_s^{-1}$. Many important CQO details are directly related to arguments in qrrvglm.control, e.g., the argument Norrr specifies which variables comprise $x_1$.

Theoretically, the four most popular VGAM family functions to be used with cqo correspond to the Poisson, binomial, normal, and negative binomial distributions. The latter is a 2-parameter model. All of these are implemented, as well as the 2-parameter gamma. The Poisson is or should be catered for by quasipoissonff and poissonff, and the binomial by quasibinomialff and binomialff. Those beginning with "quasi" have dispersion parameters that are estimated for each species.

For initial values, the function .Init.Poisson.QO should work reasonably well if the data is Poisson with species having equal tolerances. It can be quite good on binary data too. Otherwise the Cinit argument in qrrvglm.control can be used. It is possible to relax the quadratic form to an additive model. The result is a data-driven approach rather than a model-driven approach, so that CQO is extended to constrained additive ordination (CAO) when $R=1$. See cao for more details.

In this documentation, $M$ is the number of linear predictors, $S$ is the number of responses (species). Then $M=S$ for Poisson and binomial species data, and $M=2S$ for negative binomial and gamma distributed species data.

References

Yee, T. W. (2004) A new technique for maximum-likelihood canonical Gaussian ordination. Ecological Monographs, 74, 685--701.

ter Braak, C. J. F. and Prentice, I. C. (1988) A theory of gradient analysis. Advances in Ecological Research, 18, 271--317. Yee, T. W. (2006) Constrained additive ordination. Ecology, 87, 203--213.

Examples

Run this code

# Example 1; Fit an unequal tolerances model to the hunting spiders data
hspider[,1:6]=scale(hspider[,1:6]) # Standardize the environmental variables
set.seed(1234)

Run the code above in your browser using DataLab