rcqo: Constrained Quadratic Ordination

Description

Random generation for constrained quadratic ordination (CQO).

Usage

rcqo(n, p, S, Rank = 1,
     family = c("poisson", "negbinomial", "binomial-poisson",
                "Binomial-negbinomial", "ordinal-poisson",
                "Ordinal-negbinomial", "gamma2"),
     EqualMaxima = FALSE, EqualTolerances = TRUE, ESOptima = FALSE,
     loabundance = if (EqualMaxima) hiabundance else 10,
     hiabundance = 100, sdlv = head(1.5/2^(0:3), Rank),
     sdOptima = ifelse(ESOptima, 1.5/Rank, 1) * ifelse(scalelv, sdlv, 1),
     sdTolerances = 0.25, Kvector = 1, Shape = 1,
     sqrt = FALSE, Log = FALSE, rhox = 0.5, breaks = 4,
     seed = NULL, Crow1positive = TRUE, xmat = NULL, scalelv = TRUE)

Arguments

Number of sites. It is denoted by $n$ below.

Number of environmental variables, including an intercept term. It is denoted by $p$ below. Must be no less than $1+R$ in value.

Number of species. It is denoted by $S$ below.

Rank

The rank or the number of latent variables or true dimension of the data on the reduced space. This must be either 1, 2, 3 or 4. It is denoted by $R$.

family

What type of species data is to be returned. The first choice is the default. If binomial then a 0 means absence and 1 means presence. If ordinal then the breaks argument is passed into the breaks argument of

EqualMaxima

Logical. Does each species have the same maxima? See arguments loabundance and hiabundance.

EqualTolerances

Logical. Does each species have the same tolerance? If TRUE then the common value is 1 along every latent variable, i.e., all species' tolerance matrices are the order-$R$ identity matrix.

ESOptima

Logical. Do the species have equally spaced optima? If TRUE then the quantity $S^{1/R}$ must be an integer with value 2 or more. That is, there has to be an appropriate number of species in total. This is so that a grid of

loabundance, hiabundance

Numeric. These are recycled to a vector of length $S$. The species have a maximum between loabundance and hiabundance. That is, at their optimal environment, the mean abundance of each species is between the two c

sdlv

Numeric, of length $R$ (recycled if necessary). Site scores along each latent variable have these standard deviation values. This must be a decreasing sequence of values because the first ordination axis contains the greatest spread of the

sdOptima

Numeric, of length $R$ (recycled if necessary). If ESOptima = FALSE then, for the $r$th latent variable axis, the optima of the species are generated from a normal distribution centered about 0. If ESOptima = TRUE

sdTolerances

Logical. If EqualTolerances = FALSE then, for the $r$th latent variable, the species' tolerances are chosen from a normal distribution with mean 1 and standard deviation sdTolerances[r]. However, the first

Kvector

A vector of positive $k$ values (recycled to length $S$ if necessary) for the negative binomial distribution (see negbinomial for details). Note that a natural default value does not exist,

Shape

A vector of positive $\lambda$ values (recycled to length $S$ if necessary) for the 2-parameter gamma distribution (see gamma2 for details). Note that a natural default value does not exist, how

sqrt

Logical. Take the square-root of the negative binomial counts? Assigning sqrt = TRUE when family="negbinomial" means that the resulting species data can be considered very crudely to be approximately Poisson distribut

Log

Logical. Take the logarithm of the gamma random variates? Assigning Log = TRUE when family="gamma2" means that the resulting species data can be considered very crudely to be approximately Gaussian distributed about i

rhox

Numeric, less than 1 in absolute value. The correlation between the environmental variables. The correlation matrix is a matrix of 1's along the diagonal and rhox in the off-diagonals. Note that each environmental variable is

breaks

If family is assigned an ordinal value then this argument is used to define the cutpoints. It is fed into the breaks argument of cut.

seed

If given, it is passed into set.seed. This argument can be used to obtain reproducible results. If set, the value is saved as the "seed" attribute of the returned value.

Crow1positive

See qrrvglm.control for details.

xmat

The $n$ by $p-1$ environmental matrix can be inputted.

scalelv

Logical. If FALSE the argument sdlv is ignored and no scaling of the latent variable values is performed.

Value

A $n$ by $p-1+S$ data frame with components and attributes. In the following the attributes are labelled with double quotes.
x2, x3, x4, ..., xpThe environmental variables. This makes up the $n$ by $p-1$ $X_2$ matrix. Note that x1 is not present; it is effectively a vector of ones since it corresponds to an intercept term when cqo is applied to the data.
y1, y2, x3, ..., ySThe species data. This makes up the $n$ by $S$ matrix $Y$. This will be of the form described by the argument family.
"ccoefficients"The $p-1$ by $R$ matrix of constrained coefficients (or canonical coefficients). These are also known as weights or loadings.
"formula"The formula involving the species and environmental variable names. This can be used directly in the formula argument of cqo.
"logmaxima"The $S$-vector of species' maxima, on a log scale. These are uniformly distributed between log(loabundance) and log(hiabundance).
"lv"The $n$ by $R$ matrix of site scores. Each successive column (latent variable) has sample standard deviation equal to successive values of sdlv.
"eta"The linear/additive predictor value.
"optima"The $S$ by $R$ matrix of species' optima.
"tolerances"The $S$ by $R$ matrix of species' tolerances. These are the square root of the diagonal elements of the tolerance matrices (recall that all tolerance matrices are restricted to being diagonal in this function).
Other attributes are "break", "family", "Rank", "loabundance", "hiabundance", "EqualTolerances", "EqualMaxima", "seed" as used.

Details

This function generates data coming from a constrained quadratic ordination (CQO) model. In particular, data coming from a species packing model can be generated with this function. The species packing model states that species have equal tolerances, equal maxima, and optima which are uniformly distributed over the latent variable space. This can be achieved by assigning the arguments ESOptima = TRUE, EqualMaxima = TRUE, EqualTolerances = TRUE.

At present, the Poisson and negative binomial abundances are generated first using loabundance and hiabundance, and if family is binomial or ordinal then it is converted into these forms.

In CQO theory the $n$ by $p$ matrix $X$ is partitioned into two parts $X_1$ and $X_2$. The matrix $X_2$ contains the `real' environmental variables whereas the variables in $X_1$ are just for adjustment purposes; they contain the intercept terms and other variables that one wants to adjust for when (primarily) looking at the variables in $X_2$. This function has $X_1$ only being a matrix of ones, i.e., containing an intercept only.

References

Yee, T. W. (2004) A new technique for maximum-likelihood canonical Gaussian ordination. Ecological Monographs, 74, 685--701.

Yee, T. W. (2006) Constrained additive ordination. Ecology, 87, 203--213.

ter Braak, C. J. F. and Prentice, I. C. (1988) A theory of gradient analysis. Advances in Ecological Research, 18, 271--317.

Examples

Run this code

# Example 1: Species packing model:
n <- 100; p <- 5; S <- 5
mydata <- rcqo(n, p, S, ESOpt = TRUE, EqualMax = TRUE)
names(mydata)
(myform <- attr(mydata, "formula"))
fit <- cqo(myform, poissonff, mydata, Bestof = 3) # EqualTol = TRUE 
matplot(attr(mydata, "lv"), mydata[,-(1:(p-1))], col = 1:S)
persp(fit, col = 1:S, add = TRUE)
lvplot(fit, lcol = 1:S, y = TRUE, pcol = 1:S) # The same plot as above

# Compare the fitted model with the 'truth'
ccoef(fit) # The fitted model
attr(mydata, "ccoefficients") # The 'truth'

c(apply(attr(mydata, "lv"), 2, sd), apply(lv(fit), 2, sd)) # Both values should be approx equal


# Example 2: negative binomial data fitted using a Poisson model:
n <- 200; p <- 5; S <- 5
mydata <- rcqo(n, p, S, fam = "negbin", sqrt = TRUE)
myform <- attr(mydata, "formula")
fit <- cqo(myform, fam = poissonff, dat = mydata) # ITol = TRUE,
lvplot(fit, lcol = 1:S, y = TRUE, pcol = 1:S)
# Compare the fitted model with the 'truth'
ccoef(fit) # The fitted model
attr(mydata, "ccoefficients") # The 'truth'


# Example 3: gamma2 data fitted using a Gaussian model:
n <- 200; p <- 5; S <- 3
mydata <- rcqo(n, p, S, fam = "gamma2", Log = TRUE)
fit <- cqo(attr(mydata, "formula"), fam = gaussianff, dat = mydata) # ITol = TRUE,
matplot(attr(mydata, "lv"), exp(mydata[,-(1:(p-1))]), col = 1:S) # 'raw' data
lvplot(fit, lcol = 1:S, y = TRUE, pcol = 1:S) # Fitted model to transformed data
# Compare the fitted model with the 'truth'
ccoef(fit) # The fitted model
attr(mydata, "ccoefficients") # The 'truth'

Run the code above in your browser using DataLab