traitglm: Fits a fourth corner model for abundance as a function of environmental variables and species traits.

Description

Fits a fourth corner model - a model to study how variation in environmental response across taxa can be explained by their traits. The function to use for fitting can be (pretty well) any predictive model, default is a generalised linear model, another good option is to add a LASSO penalty via glm1path. Can handle overdispersed counts via family="negative.binomial", which is the default family argument.

Usage


traitglm(L, R, Q = NULL, family="negative.binomial", formula = NULL, method = "manyglm",
            composition = FALSE, col.intercepts = TRUE, ...)

Arguments

A data frame (or matrix) containing the abundances for each taxon (columns) across all sites (rows).

A data frame (or matrix) of environmental variables (columns) across all sites (rows).

A data frame (or matrix) of traits (columns) across all taxa (rows). If not specified, a different environmental response will be specified for each taxon.

family

The family of the response variable, see family. Negative binomial with unknown overdispersion and a log-link can be specified as "negative.binomial", and is the default.

formula

A one-sided formula specifying exactly how to model abundance as a function of environmental and trait variables (as found in R and Q respectively). Default is to include all terms additively, with quadratics for quantitative terms, and all environment-by-trait interactions.

method

The function to use to fit the model. Default is manyglm, some other available options are glm1path, cv.glm1path for LASSO-penalised fits, but in principle any model-fitting function that accepts formula input and a family argument should work.

composition

logical. TRUE includes a row effect in the model, adjusting for different sampling intensities across different samples. This can be understood as a compositional term in the sense that all other terms then model relative abundance at a site. FALSE (default) does not include a row effect, hence the model is of absolute abundance.

col.intercepts

logical. TRUE (default) includes a column effect in the model, to adjust for different levels of abundance of different response (column) variables. FALSE removes this column effect.

…

Arguments passed to the function specified at method that will be used to fit the model.

Value

Returns a traitglm object, a list that contains at least the following components:

…

Exactly what is included in output depends on the fitting function - by default, a manyglm object is returned, so all usual manyglm output is included (coefficients, residuals, deviance, etc).

family

A family object matching the final model.

fourth.corner

A matrix of fourth corner coefficients. If formula has been manually entered, this will be a vector not a matrix.

R.des

The reduced-size design matrix for environmental variables, including further arguments:

X: Data frame of (possibly standardised) environmental variables
X.squ: A data frame containing the leading term in a quadratic expression (where appropriate) for environmental variables
var.type: A vector with the same dimension as the number of columns of X, listing the type of ecah enviromental variable ("quantitative"" or "factor"")
coefs: Coefficients used in transforming variables to orthogonality. These are used later to make predictions.

Q.des

The reduced-size design matrix for traits, set up as for R.des.

spp.penalty

For LASSO fits: a vector of the same length as the final design matrix, indicating which variables had a penalty imposed on them in model fitting.

L

The data frame of abundances specified as input.

any.penalty

Logical, is any penalty applied to parameters at all (not if using a manyglm fit).

scaling

A list of coefficients describing the standaridsations of variables used in analyses. Stored for use later if making predictions.

call

The original call traitglm call.

Details

This function fits a fourth corner model, that is, a model to predict abundance across several taxa (stored in L) as a function of environmental variables (R) and traits (Q). The environment-trait interaction can be understood as the fourth corner, giving the set of coefficients that describe how environmental response across taxa varies as traits vary. A species effect is include in the model (i.e. a different intercept term for each species), so that traits are used to explain patterns in relative abundance across taxa not patterns in absolute abundance.

The actual function used to fit the model is determined by the user through the method argument. The default is to use manyglm to fit a GLM, although for predictive modelling, it might be better to use a LASSO penalty as in glm1path and cv.glm1path. In glm1path, the penalty used for BIC calculation is log(dim(L)[1]), i.e. log(number of sites).

The model is fitted by vectorising L then constructing a big matrix from repeated values of R, Q, their quadratic terms (if required) and interactions. Hence this function will hit memory issues if any of these matrices are large, and can slow down (especially if using cv.glm1path). If formula is left unspecified, the design matrix is constructed using all environmental variables and traits specified in R and Q, and quadratic terms for any of these variables that are quantitative, and all environment-trait interactions, after standardising these variables. Specifying a one-sided formula as a function of the variables in R and Q would instead give the user control over the precise model that is fitted, and drops the internal standardisations. The arguments composition and col.intercepts optionally add terms to the model for row and column total abundance, irrespective of whether a formula has been specified.

Note: when specifying a formula, if there are no penalties on coefficients (as for manyglm), then main effects for R can be excluded if including row effects (via composition=TRUE), and main effects for Q can be excluded if including column effects (via col.intercepts=TRUE), because those terms are redundant (trying to explain main effects for row/column when these main effects are already in the model). If using penalised likelihood (as in glm1path and cv.glm1path) or a random effects model, by all means include main effects as well as row/column effects, and the penalties will sort out which terms to use.

If trait matrix Q is not specified, default behaviour will fit a different environmental response for each taxon (and the outcome will be very similar to manyglm(L~R)). This can be understood as a fourth corner model where species identities are used as the species traits (i.e. no attempt is made to explain differences across species).

These functions inherit default behaviour from their fitting functions. e.g. use plot for a Dunn-Smyth residual plot from a traits model fitted using manyglm or glm1path.

References

Brown AM, Warton DI, Andrew NR, Binns M, Cassis G and Gibb H (2014) The fourth corner solution - using species traits to better understand how species traits interact with their environment, Methods in Ecology and Evolution 5, 344-352.

Warton DI, Shipley B & Hastie T (2015) CATS regression - a model-based approach to studying trait-based community assembly, Methods in Ecology and Evolution 6, 389-398.

Examples

Run this code

# NOT RUN {
data(antTraits)

ft=traitglm(antTraits$abund,antTraits$env,antTraits$traits,method="manyglm")
ft$fourth #print fourth corner terms

# for a pretty picture of fourth corner coefficients, uncomment the following lines:
# library(lattice)
# a        = max( abs(ft$fourth.corner) )
# colort   = colorRampPalette(c("blue","white","red")) 
# plot.4th = levelplot(t(as.matrix(ft$fourth.corner)), xlab="Environmental Variables",
#   ylab="Species traits", col.regions=colort(100), at=seq(-a, a, length=100),
#   scales = list( x= list(rot = 45)))
# print(plot.4th)

plot(ft) # for a Dunn-smyth residual plot
qqnorm(residuals(ft)); abline(c(0,1),col="red") # for a normal quantile plot.

# predict to the first five sites
predict(ft,newR=antTraits$env[1:5,])

# refit using LASSO and less variables, including row effects and only two interaction terms:
ft1=traitglm(antTraits$abund,antTraits$env[,3:4],antTraits$traits[,c(1,3)],
      formula=~Shrub.cover:Femur.length+Shrub.cover:Pilosity,composition=TRUE,method="glm1path")
ft1$fourth #notice LASSO penalty has one interaction to zero

# }

Run the code above in your browser using DataLab