pglmm: Phylogenetic Generalised Linear Mixed Model for Community Data

Description

This function performs Generalized Linear Mixed Models for binary and continuous phylogenetic data, estimating regression coefficients with approximate standard errors. It is a modeled after lmer but is more general by allowing correlation structure within random effects; these correlations can be phylogenetic among species, or any other correlation structure, such as geographical correlations among sites. It is, however, much more specific than lmer in that it can only analyze a subset of1 the types of model designed handled by lmer. It is also much slower than lmer and requires users to specify correlation structures as covariance matrices. communityPGLMM can analyze models in Ives and Helmus (2011). It can also analyze bipartite phylogenetic data, such as that analyzed in Rafferty and Ives (2011), by giving sites phylogenetic correlations.

calculates the statistical significance of the random effects in the generalized linear mixed model from the marginal profile likelihood.

communityPGLMM.binary.LRT tests statistical significance of the phylogenetic random effect on species slopes using a likelihood ratio test

communityPGLMM.matrix.structure produces the entire covariance matrix structure (V) when you specify random effects.

communityPGLMM.predicted.values calculates the predicted values of Y; for the generalized linear mixed model (family = "binomial"), these values are in the logit-1 transformed space.

Usage

communityPGLMM(formula, data = list(), family = "gaussian", sp = NULL,
  site = NULL, random.effects = list(), REML = TRUE, s2.init = NULL,
  B.init = NULL, reltol = 10^-6, maxit = 500, tol.pql = 10^-6,
  maxit.pql = 200, verbose = FALSE)
communityPGLMM.gaussian(formula, data = list(), family = "gaussian",
  sp = NULL, site = NULL, random.effects = list(), REML = TRUE,
  s2.init = NULL, B.init = NULL, reltol = 10^-8, maxit = 500,
  verbose = FALSE)
communityPGLMM.binary(formula, data = list(), family = "binomial",
  sp = NULL, site = NULL, random.effects = list(), REML = TRUE,
  s2.init = 0.25, B.init = NULL, reltol = 10^-5, maxit = 40,
  tol.pql = 10^-6, maxit.pql = 200, verbose = FALSE)
communityPGLMM.binary.LRT(x, re.number = 0, ...)
communityPGLMM.matrix.structure(formula, data = list(), family = "binomial",
  sp = NULL, site = NULL, random.effects = list(), ss = 1)
## S3 method for class 'communityPGLMM':
summary(object, digits = max(3, getOption("digits") -
  3), ...)
## S3 method for class 'communityPGLMM':
plot(x, digits = max(3, getOption("digits") - 3),
  ...)
communityPGLMM.predicted.values(x, show.plot = TRUE, ...)

Arguments

formula

a two-sided linear formula object describing the fixed-effects of the model; for example, Y ~ X.

data

a data.frame containing the variables named in formula. The data frame should have long format with factors specifying species and sites. communityPGLMM will reorder rows of the data frame s

family

either gaussian for a Linear Mixed Model, or binomial for binary dependent data.

a factor variable that identifies species

site

a factor variable that identifies sites

random.effects

a list that contains, for non-nested random effects, lists of triplets of the form list(X, group = group, covar = V). This is modeled after the lmer

REML

whether REML or ML is used for model fitting. For the generalized linear mixed model for binary data, these don't have standard interpretations, and there is no log likelihood function that can be used in likelihood ratio tests.

s2.init

an array of initial estimates of s2 for each random effect that scales the variance. If s2.init is not provided for family="gaussian", these are estimated using in a clunky way using lm assuming no

B.init

initial estimates of $B$, a matrix containing regression coefficients in the model for the fixed effects. This matrix must have dim(B.init)=c(p+1,1), where p is the number of predictor (independent) variables; the first element o

reltol

a control parameter dictating the relative tolerance for convergence in the optimization; see optim.

maxit

a control parameter dictating the maximum number of iterations in the optimization; see optim.

tol.pql

a control parameter dictating the tolerance for convergence in the PQL estimates of the mean components of the binomial GLMM.

maxit.pql

a control parameter dictating the maximum number of iterations in the PQL estimates of the mean components of the binomial GLMM.

verbose

if TRUE, the model deviance and running estimates of s2 and B are plotted each iteration during optimization.

...

additional arguments to summary and plotting functions (currently ignored)

communityPGLMM model to be tested

re.number

which random.effect in x to be tested

which of the random.effects to produce

digits

minimal number of significant digits for printing, as in print.default

object

communityPGLMM object to be summarised

show.plot

if TRUE (default), display plot

Value

an object of class communityPGLMM
formulathe formula for fixed effects
datathe dataset
familyeither gaussian or binomial depending on the model fit
random.effectsthe list of random effects
Bestimates of the regression coefficients
B.seapproximate standard errors of the fixed effects regression coefficients
B.covapproximate covariance matrix for the fixed effects regression coefficients
B.zscoreapproximate Z scores for the fixed effects regression coefficients
B.pvalueapproximate tests for the fixed effects regression coefficients being different from zero
ssrandom effects' standard deviations for the covariance matrix $\sigma^2V$ for each random effect in order. For the linear mixed model, the residual variance is listed last
s2rrandom effects variances for nested random effects
s2nrandom effects variances for non-nested random effects
s2residfor linear mixed models, the residual vairance
logLIKfor linear mixed models, the log-likelihood for either the restricted likelihood (REML=TRUE) or the overall likelihood (REML=FALSE). This is set to NULL for generalised linear mixed models
AICfor linear mixed models, the AIC for either the restricted likelihood (REML=TRUE) or the overall likelihood (REML=FALSE). This is set to NULL for generalised linear mixed models
BICfor linear mixed models, the BIC for either the restricted likelihood (REML=TRUE) or the overall likelihood (REML=FALSE). This is set to NULL for generalised linear mixed models
REMLwhether or not REML is used (TRUE or FALSE)
s2.initthe user-provided initial estimates of s2
B.initthe user-provided initial estimates of B
Ythe response (dependent) variable returned in matrix form
Xthe predictor (independent) variables returned in matrix form (including 1s in the first column)
Hthe residuals. For the generalized linear mixed model, these are the predicted residuals in the $logit^{-1}$ space
iVthe inverse of the covariance matrix for the entire system (of dimension (nsp*nsite) by (nsp*nsite))
mupredicted mean values for the generalized linear mixed model. Set to NULL for linear mixed models
sp, spmatrices used to construct the nested design matrix
Ztthe design matrix for random effects
Stdiagonal matrix that maps the random effects variances onto the design matrix
convcodethe convergence code provided by optim
niternumber of iterations performed by optim

Details

For linear mixed models (family = 'gaussian'), the function estimates parameters for the model of the form, for example,

$$Y = \beta_0 + \beta_1x + b_0 + b_1x$$ $$b_0 ~ Gaussian(0, \sigma_0^2I_{sp})$$ $$b_1 ~ Gaussian(0, \sigma_0^2V_{sp})$$ $$\eta ~ Gaussian(0,\sigma^2)$$

where $\beta_0$ and $\beta_1$ are fixed effects, and $V_{sp}$ is a variance-covariance matrix derived from a phylogeny (typically under the assumption of Brownian motion evolution). Here, the variation in the mean (intercept) for each species is given by the random effect $b_0$ that is assumed to be independent among species. Variation in species' responses to predictor variable $x$ is given by a random effect $b_0$ that is assumed to depend on the phylogenetic relatedness among species given by $V_{sp}$; if species are closely related, their specific responses to $x$ will be similar. This particular model would be specified as

re.1 <- list(1, sp = dat$sp, covar = diag(nspp)) re.2 <- list(dat$X, sp = dat$sp, covar = Vsp) z <- communityPGLMM(Y ~ X, data = data, family = "gaussian", random.effects = list(re.1, re.2))

The covariance matrix covar is standardized to have its determinant equal to 1. This in effect standardizes the interpretation of the scalar $\sigma^2$. Although mathematically this is not required, it is a very good idea to standardize the predictor (independent) variables to have mean 0 and variance 1. This will make the function more robust and improve the interpretation of the regression coefficients. For categorical (factor) predictor variables, you will need to construct 0-1 dummy variables, and these should not be standardized (for obvious reasons).

For binary generalized linear mixed models (family = 'binomial'), the function estimates parameters for the model of the form, for example,

$$y = \beta_0 + \beta_1x + b_0 + b_1x$$ $$Y = logit^{-1}(y)$$ $$b_0 ~ Gaussian(0, \sigma_0^2I_{sp})$$ $$b_1 ~ Gaussian(0, \sigma_0^2V_{sp})$$

where $\beta_0$ and $\beta_1$ are fixed effects, and $V_{sp}$ is a variance-covariance matrix derived from a phylogeny (typically under the assumption of Brownian motion evolution).

z <- communityPGLMM(Y ~ X, data = data, family = 'binomial', random.effects = list(re.1, re.2))

As with the linear mixed model, it is a very good idea to standardize the predictor (independent) variables to have mean 0 and variance 1. This will make the function more robust and improve the interpretation of the regression coefficients. For categorical (factor) predictor variables, you will need to construct 0-1 dummy variables, and these should not be standardized (for obvious reasons).

References

Ives, A. R. and M. R. Helmus. 2011. Generalized linear mixed models for phylogenetic analyses of community structure. Ecological Monographs 81:511-525.

Rafferty, N. E., and A. R. Ives. 2013. Phylogenetic trait-based analyses of ecological networks. Ecology 94:2321-2333.

Examples

Run this code

## Structure of examples:
# First, a (brief) description of model types, and how they are specified
# - these are *not* to be run 'as-is'; they show how models should be organised
# Second, a run-through of how to simulate, and then analyse, data
# - these *are* to be run 'as-is'; they show how to format and work with data

#########################################################
#First section; brief summary of models and their use####
#########################################################
## Model structures from Ives & Helmus (2011)
# dat = data set for regression (note: *not* an comparative.comm object)
# nspp = number of species
# nsite = number of sites
# Vphy = phylogenetic covariance matrix for species
# Vrepul = inverse of Vphy representing phylogenetic repulsion

# Model 1 (Eq. 1)
re.site <- list(1, site = dat$site, covar = diag(nsite))
re.sp.site <- list(1, sp = dat$sp, covar = Vphy, site = dat$site) # note: nested
z <- communityPGLMM(freq ~ sp, data = dat, family = "binomial", sp
= dat$sp, site = dat$site, random.effects = list(re.site,
re.sp.site), REML = TRUE, verbose = TRUE, s2.init=.1)


# Model 2 (Eq. 2)
re.site <- list(1, site = dat$site, covar = diag(nsite))
re.slope <- list(X, sp = dat$sp, covar = diag(nspp))
re.slopephy <- list(X, sp = dat$sp, covar = Vphy)
z <- communityPGLMM(freq ~ sp + X, data = dat, family = "binomial",
sp = dat$sp, site = dat$site, random.effects = list(re.site,
re.slope, re.slopephy), REML = TRUE, verbose = TRUE, s2.init=.1)

# Model 3 (Eq. 3)
re.site <- list(1, site = dat$site, covar = diag(nsite))
re.sp.site <- list(1, sp = dat$sp, covar = Vrepul, site = dat$site) # note: nested
z <- communityPGLMM(freq ~ sp*X, data = dat, family = "binomial",
sp = dat$sp, site = dat$site, random.effects = list(re.site,
re.sp.site), REML = TRUE, verbose = TRUE, s2.init=.1)

## Model structure from Rafferty & Ives (2013) (Eq. 3)
# dat = data set
# npp = number of pollinators (sp)
# nsite = number of plants (site)
# VphyPol = phylogenetic covariance matrix for pollinators
# VphyPlt = phylogenetic covariance matrix for plants

re.a <- list(1, sp = dat$sp, covar = diag(nspp))
re.b <- list(1, sp = dat$sp, covar = VphyPol)
re.c <- list(1, sp = dat$sp, covar = VphyPol, dat$site)
re.d <- list(1, site = dat$site, covar = diag(nsite))
re.f <- list(1, site = dat$site, covar = VphyPlt)
re.g <- list(1, site = dat$site, covar = VphyPlt, dat$sp)
#term h isn't possible in this implementation, but can be done with
available matlab code

z <- communityPGLMM(freq ~ sp*X, data = dat, family = "binomial",
sp = dat$sp, site = dat$site, random.effects = list(re.a, re.b,
re.c, re.d, re.f, re.g), REML = TRUE, verbose = TRUE, s2.init=.1)

#########################################################
#Second section; detailed simulation and analysis #######
#########################################################
# Generate simulated data for nspp species and nsite sites
nspp <- 15
nsite <- 10

# residual variance (set to zero for binary data)
sd.resid <- 0

# fixed effects
beta0 <- 0
beta1 <- 0

# magnitude of random effects
sd.B0 <- 1
sd.B1 <- 1

# whether or not to include phylogenetic signal in B0 and B1
signal.B0 <- TRUE
signal.B1 <- TRUE

# simulate a phylogenetic tree
phy <- rtree(n = nspp)
phy <- compute.brlen(phy, method = "Grafen", power = 0.5)

# standardize the phylogenetic covariance matrix to have determinant 1
Vphy <- vcv(phy)
Vphy <- Vphy/(det(Vphy)^(1/nspp))

# Generate environmental site variable
X <- matrix(1:nsite, nrow = 1, ncol = nsite)
X <- (X - mean(X))/sd(X)

# Perform a Cholesky decomposition of Vphy. This is used to
# generate phylogenetic signal: a vector of independent normal random
# variables, when multiplied by the transpose of the Cholesky
# deposition of Vphy will have covariance matrix equal to Vphy.

iD <- t(chol(Vphy))

# Set up species-specific regression coefficients as random effects
if (signal.B0 == TRUE) {
		b0 <- beta0 + iD %*% rnorm(nspp, sd = sd.B0)
} else {
		b0 <- beta0 + rnorm(nspp, sd = sd.B0)
}
if (signal.B1 == TRUE) {
		b1 <- beta1 + iD %*% rnorm(nspp, sd = sd.B1)
} else {
		b1 <- beta1 + rnorm(nspp, sd = sd.B1)
}

# Simulate species abundances among sites to give matrix Y that
# contains species in rows and sites in columns
y <- matrix(outer(b0, array(1, dim = c(1, nsite))), nrow = nspp,
ncol = nsite) + matrix(outer(b1, X), nrow = nspp, ncol = nsite)
e <- rnorm(nspp * nsite, sd = sd.resid)
y <- y + matrix(e, nrow = nspp, ncol = nsite)
y <- matrix(y, nrow = nspp * nsite, ncol = 1)

Y <- rbinom(n = length(y), size = 1, prob = exp(y)/(1 + exp(y)))
Y <- matrix(Y, nrow = nspp, ncol = nsite)

# name the simulated species 1:nspp and sites 1:nsites
rownames(Y) <- 1:nspp
colnames(Y) <- 1:nsite

par(mfrow = c(3, 1), las = 1, mar = c(2, 4, 2, 2) - 0.1)
matplot(t(X), type = "l", ylab = "X", main = "X among sites")
hist(b0, xlab = "b0", main = "b0 among species")
hist(b1, xlab = "b1", main = "b1 among species")

par(mfrow = c(1, 1), las = 1, mar = c(4, 4, 2, 2) - 0.1)
if(require(plotrix))
    color2D.matplot(Y, ylab = "species", xlab = "sites", main = "abundance")

# Transform data matrices into "long" form, and generate a data frame
YY <- matrix(Y, nrow = nspp * nsite, ncol = 1)

XX <- matrix(kronecker(X, matrix(1, nrow = nspp, ncol = 1)), nrow =
nspp * nsite, ncol = 1)

site <- matrix(kronecker(1:nsite, matrix(1, nrow = nspp, ncol =
1)), nrow = nspp * nsite, ncol = 1)
sp <- matrix(kronecker(matrix(1, nrow = nsite, ncol = 1), 1:nspp),
nrow = nspp * nsite, ncol = 1)

dat <- data.frame(Y = YY, X = XX, site = as.factor(site), sp = as.factor(sp))

# Format input and perform communityPGLMM()
# set up random effects

# random intercept with species independent
re.1 <- list(1, sp = dat$sp, covar = diag(nspp))

# random intercept with species showing phylogenetic covariances
re.2 <- list(1, sp = dat$sp, covar = Vphy)

# random slope with species independent
re.3 <- list(dat$X, sp = dat$sp, covar = diag(nspp))

# random slope with species showing phylogenetic covariances
re.4 <- list(dat$X, sp = dat$sp, covar = Vphy)

# random effect for site
re.site <- list(1, site = dat$site, covar = diag(nsite))

z.binary <- communityPGLMM(Y ~ X, data = dat, family = "binomial",
sp = dat$sp, site = dat$site, random.effects = list(re.1, re.2,
re.3, re.4), REML = TRUE, verbose = FALSE)

# output results
z.binary
plot(z.binary)

# test statistical significance of the phylogenetic random effect
# on species slopes using a likelihood ratio test
# communityPGLMM.binary.LRT(z.binary, re.number = 4)$Pr

# The rest of these tests are not run to save CRAN server time;
# - please take a look at them because they're very useful!
# extract the predicted values of Y
communityPGLMM.predicted.values(z.binary, show.plot = TRUE)

# examine the structure of the overall covariance matrix
communityPGLMM.matrix.structure(Y ~ X, data = dat, family =
"binomial", sp = dat$sp, site = dat$site, random.effects =
list(re.1, re.2, re.3, re.4))

# look at the structure of re.1
communityPGLMM.matrix.structure(Y ~ X, data = dat, family =
"binomial", sp = dat$sp, site = dat$site, random.effects =
list(re.1))

# compare results to glmer() when the model contains no
# phylogenetic covariance among species; the results should be
# similar.
communityPGLMM(Y ~ X, data = dat, family = "binomial", sp = dat$sp,
site = dat$site, random.effects = list(re.1, re.3), REML = FALSE,
verbose = FALSE)

# lmer
if(require(lme4)){
summary(glmer(Y ~ X + (1 | sp) + (0 + X | sp), data=dat, family =
"binomial"))

# compare results to lmer() when the model contains no phylogenetic
# covariance among species; the results should be similar.
communityPGLMM(Y ~ X, data = dat, family = "gaussian", sp = dat$sp,
site = dat$site, random.effects = list(re.1, re.3), REML = FALSE,
verbose = FALSE)

# lmer
summary(lmer(Y ~ X + (1 | sp) + (0 + X | sp), data=dat, REML = FALSE))
}

Run the code above in your browser using DataLab