uqo: Fitting Unconstrained Quadratic Ordination (UQO)

Description

An unconstrained quadratic ordination (UQO) (equivalently, noncanonical Gaussian ordination) model is fitted using the quadratic unconstrained vector generalized linear model (QU-VGLM) framework. In this documentation, $M$ is the number of linear predictors or species.

Usage

uqo(formula, family, data = list(), weights = NULL, subset = NULL,
    na.action = na.fail, etastart = NULL, mustart = NULL,
    coefstart = NULL, control = uqo.control(...), offset = NULL,
    method = "uqo.fit", model = FALSE, x.arg = TRUE, y.arg = TRUE,
    contrasts = NULL, constraints = NULL, extra = NULL,
    qr.arg = FALSE, ...)

Arguments

formula

a symbolic description of the model to be fit. Since there is no $x_2$ vector by definition, the RHS of the formula has all terms belonging to the $x_1$ vector.

family

a function of class "vglmff" describing what statistical model is to be fitted. Currently two families are supported: Poisson and binomial.

data

an optional data frame containing the variables in the model. By default the variables are taken from environment(formula), typically the environment from which uqo is called.

weights

an optional vector or matrix of (prior) weights to be used in the fitting process. This argument should not be used.

subset

an optional logical vector specifying a subset of observations to be used in the fitting process.

na.action

a function which indicates what should happen when the data contain NAs. The default is set by the na.action setting of options, and is na.fail

etastart

starting values for the linear predictors. It is a $M$-column matrix. If $M = 1$ then it may be a vector.

mustart

starting values for the fitted values. It can be a vector or a matrix. Some family functions do not make use of this argument.

coefstart

starting values for the coefficient vector.

control

a list of parameters for controlling the fitting process. See uqo.control for details.

offset

a vector or $M$-column matrix of offset values. This argument should not be used.

method

the method to be used in fitting the model. The default (and presently only) method uqo.fit uses iteratively reweighted least squares (IRLS).

model

a logical value indicating whether the model frame should be assigned in the model slot.

x.arg, y.arg

logical values indicating whether the model matrix and response matrix used in the fitting process should be assigned in the x and y slots. Note the model matrix is the LM model matrix.

contrasts

an optional list. See the contrasts.arg of model.matrix.default.

constraints

an optional list of constraint matrices. This argument should not be used.

extra

an optional list with any extra information that might be needed by the family function.

qr.arg

logical value indicating whether the slot qr, which returns the QR decomposition of the VLM model matrix, is returned on the object. This argument should not be set TRUE.

...

further arguments passed into uqo.control.

Value

An object of class "uqo" (this may change to "quvglm" in the future).

Warning

Local solutions are not uncommon when fitting UQO models. To increase the chances of obtaining the global solution, set ITolerances = TRUE or EqualTolerances = TRUE and increase the value of the argument Bestof in uqo.control. For reproducibility of the results, it pays to set a different random number seed before calling uqo (the function set.seed does this).

The function uqo is very sensitive to initial values, and there is a lot of room for improvement here.

UQO is computationally expensive. It pays to keep the rank to no more than 2, and 1 is much preferred over 2. The data needs to conform closely to the statistical model.

Currently there is a bug with the argument Crow1positive in uqo.control. This argument might be interpreted as controlling the sign of the first site score, but currently this is not done.

Details

Unconstrained quadratic ordination models fit symmetric bell-shaped response curves/surfaces to response data, but the latent variables are largely free parameters and are not constrained to be linear combinations of the environmental variables. This poses a difficult optimization problem. The current algorithm is very simple and will often fail (even for Rank = 1) but hopefully this will be improved in the future.

The central formula is given by $$\eta = B_1^T x_1 + A \nu + \sum_{m = 1}^M (\nu^T D_m \nu) e_m$$ where $x_1$ is a vector (usually just a 1 for an intercept), $\nu$ is a $R$-vector of latent variables, $e_m$ is a vector of 0s but with a 1 in the $m$th position. The $\eta$ are a vector of linear/additive predictors, e.g., the $m$th element is $\eta_m = \log(E[Y_m])$ for the $m$th species. The matrices $B_1$, $A$, and $D_m$ are estimated from the data, i.e., contain the regression coefficients. Also, $\nu$ is estimated. The tolerance matrices satisfy $T_s = -\frac12 D_s^{-1}$. Many important UQO details are directly related to arguments in uqo.control; see also cqo and qrrvglm.control.

Currently, only Poisson and binomial VGAM family functions are implemented for this function, and dispersion parameters for these are assumed known. Thus the Poisson is catered for by poissonff, and the binomial by binomialff. Those beginning with "quasi" have dispersion parameters that are estimated for each species, hence will give an error message here.

References

Yee, T. W. (2004) A new technique for maximum-likelihood canonical Gaussian ordination. Ecological Monographs, 74, 685--701.

Yee, T. W. (2006) Constrained additive ordination. Ecology, 87, 203--213.

Examples

Run this code

set.seed(123) # This leads to the global solution
hspider[,1:6] <- scale(hspider[,1:6]) # Standardized environmental vars
p1 <- cqo(cbind(Alopacce, Alopcune, Alopfabr, Arctlute, Arctperi,
                Auloalbi, Pardlugu, Pardmont, Pardnigr, Pardpull,
                Trocterr, Zoraspin) ~
          WaterCon + BareSand + FallTwig + CoveMoss + CoveHerb + ReflLux,
          ITolerances = TRUE, fam = poissonff, data = hspider, 
          Crow1positive = TRUE, Bestof=3, trace = FALSE)
if (deviance(p1) > 1589.0) stop("suboptimal fit obtained")

set.seed(111)
up1 <- uqo(cbind(Alopacce, Alopcune, Alopfabr, Arctlute, Arctperi,
                Auloalbi, Pardlugu, Pardmont, Pardnigr, Pardpull,
                Trocterr, Zoraspin) ~ 1,
          family = poissonff, data = hspider,
          ITolerances = TRUE,
          Crow1positive = TRUE, lvstart = -lv(p1))
if (deviance(up1) > 1310.0) stop("suboptimal fit obtained")

nos <- ncol(depvar(up1))  # Number of species
clr <- (1:(nos+1))[-7] # Omit yellow
lvplot(up1, las = 1, y = TRUE, pch = 1:nos, scol = clr, lcol = clr, 
       pcol = clr, llty = 1:nos, llwd = 2)
legend(x = 2, y = 135, colnames(up1@y), col = clr, lty = 1:nos,
       lwd = 2, merge = FALSE, ncol = 1, x.inter = 4.0, bty = "l", cex = 0.9)

# Compare the site scores between the two models
plot(lv(p1), lv(up1), xlim = c(-3, 4), ylim = c(-3, 4), las = 1)
abline(a = 0, b = -1, lty = 2, col = "blue", xpd = FALSE)
cor(lv(p1, ITol = TRUE), lv(up1))

# Another comparison between the constrained and unconstrained models
# The signs are not right so they are similar when reflected about 0 
par(mfrow = c(2, 1))
persp(up1, main = "Red/Blue are the constrained/unconstrained models",
      label = TRUE, col = "blue", las = 1)
persp(p1, add = FALSE, col = "red")
pchisq(deviance(p1) - deviance(up1), df = 52-30, lower.tail = FALSE)

Run the code above in your browser using DataLab