pglmm: Phylogenetic Generalised Linear Mixed Model for Community Data

Description

This function performs Generalized Linear Mixed Models for binary and continuous phylogenetic data, estimating regression coefficients with approximate standard errors. It is modeled after lmer but is more general by allowing correlation structure within random effects; these correlations can be phylogenetic among species, or any other correlation structure, such as geographical correlations among sites. It is, however, much more specific than lmer in that it can only analyze a subset of1 the types of model designed handled by lmer. It is also much slower than lmer and requires users to specify correlation structures as covariance matrices. communityPGLMM can analyze models in Ives and Helmus (2011). It can also analyze bipartite phylogenetic data, such as that analyzed in Rafferty and Ives (2011), by giving sites phylogenetic correlations.

Usage

communityPGLMM(
  formula,
  data = list(),
  family = "gaussian",
  sp = NULL,
  site = NULL,
  random.effects = list(),
  REML = TRUE,
  s2.init = NULL,
  B.init = NULL,
  reltol = 10^-6,
  maxit = 500,
  tol.pql = 10^-6,
  maxit.pql = 200,
  verbose = FALSE
)
communityPGLMM.gaussian(
  formula,
  data = list(),
  family = "gaussian",
  sp = NULL,
  site = NULL,
  random.effects = list(),
  REML = TRUE,
  s2.init = NULL,
  B.init = NULL,
  reltol = 10^-8,
  maxit = 500,
  verbose = FALSE
)
communityPGLMM.binary(
  formula,
  data = list(),
  family = "binomial",
  sp = NULL,
  site = NULL,
  random.effects = list(),
  REML = TRUE,
  s2.init = 0.25,
  B.init = NULL,
  reltol = 10^-5,
  maxit = 40,
  tol.pql = 10^-6,
  maxit.pql = 200,
  verbose = FALSE
)
communityPGLMM.binary.LRT(x, re.number = 0, ...)
communityPGLMM.matrix.structure(
  formula,
  data = list(),
  family = "binomial",
  sp = NULL,
  site = NULL,
  random.effects = list(),
  ss = 1
)
# S3 method for communityPGLMM
summary(object, digits = max(3, getOption("digits") - 3), ...)
# S3 method for communityPGLMM
plot(x, digits = max(3, getOption("digits") - 3), ...)
communityPGLMM.predicted.values(x, show.plot = TRUE, ...)

Value

an object of class communityPGLMM

formula: the formula for fixed effects
data: the dataset
family: either gaussian or binomial depending on the model fit
random.effects: the list of random effects
B: estimates of the regression coefficients
B.se: approximate standard errors of the fixed effects regression coefficients
B.cov: approximate covariance matrix for the fixed effects regression coefficients
B.zscore: approximate Z scores for the fixed effects regression coefficients
B.pvalue: approximate tests for the fixed effects regression coefficients being different from zero
ss: random effects' standard deviations for the covariance matrix $\sigma^2V$ for each random effect in order. For the linear mixed model, the residual variance is listed last
s2r: random effects variances for non-nested random effects
s2n: random effects variances for nested random effects
s2resid: for linear mixed models, the residual vairance
logLIK: for linear mixed models, the log-likelihood for either the restricted likelihood (REML=TRUE) or the overall likelihood (REML=FALSE). This is set to NULL for generalised linear mixed models
AIC: for linear mixed models, the AIC for either the restricted likelihood (REML=TRUE) or the overall likelihood (REML=FALSE). This is set to NULL for generalised linear mixed models
BIC: for linear mixed models, the BIC for either the restricted likelihood (REML=TRUE) or the overall likelihood (REML=FALSE). This is set to NULL for generalised linear mixed models
REML: whether or not REML is used (TRUE or FALSE)
s2.init: the user-provided initial estimates of s2
B.init: the user-provided initial estimates of B
Y: the response (dependent) variable returned in matrix form
X: the predictor (independent) variables returned in matrix form (including 1s in the first column)
H: the residuals. For the generalized linear mixed model, these are the predicted residuals in the $logit^{-1}$ space
iV: the inverse of the covariance matrix for the entire system (of dimension (nsp*nsite) by (nsp*nsite))
mu: predicted mean values for the generalized linear mixed model. Set to NULL for linear mixed models
sp, sp: matrices used to construct the nested design matrix
Zt: the design matrix for random effects
St: diagonal matrix that maps the random effects variances onto the design matrix
convcode: the convergence code provided by optim
niter: number of iterations performed by optim

Arguments

formula: a two-sided linear formula object describing the fixed-effects of the model; for example, Y ~ X.
data: a data.frame containing the variables named in formula. The data frame should have long format with factors specifying species and sites. communityPGLMM will reorder rows of the data frame so that species are nested within sites. Please note that calling as.data.frame.comparative.comm will return your comparative.comm object into this format for you.
family: either gaussian for a Linear Mixed Model, or binomial for binary dependent data.
sp: a factor variable that identifies species
site: a factor variable that identifies sites
random.effects: a list that contains, for non-nested random effects, lists of triplets of the form list(X, group = group, covar = V). This is modeled after the lmer formula syntax (X | group) where X is a variable and group is a grouping factor. Note that group should be either your sp or site variable specified in sp and site. The additional term V is a covariance matrix of rank equal to the number of levels of group that specifies the covariances among groups in the random effect X. For nested variable random effects, random.effects contains lists of quadruplets of the form list(X, group1 = group1, covar = V, group2 = group2) where group1 is nested within group2.
REML: whether REML or ML is used for model fitting. For the generalized linear mixed model for binary data, these don't have standard interpretations, and there is no log likelihood function that can be used in likelihood ratio tests.
s2.init: an array of initial estimates of s2 for each random effect that scales the variance. If s2.init is not provided for family="gaussian", these are estimated using in a clunky way using lm assuming no phylogenetic signal. A better approach is to run link[lme4:lmer]{lmer} and use the output random effects for s2.init. If s2.init is not provided for family="binomial", these are set to 0.25.
B.init: initial estimates of $B$, a matrix containing regression coefficients in the model for the fixed effects. This matrix must have dim(B.init)=c(p+1,1), where p is the number of predictor (independent) variables; the first element of B corresponds to the intercept, and the remaining elements correspond in order to the predictor (independent) variables in the formula. If B.init is not provided, these are estimated using in a clunky way using lm or glm assuming no phylogenetic signal. A better approach is to run lmer and use the output fixed effects for B.init.
reltol: a control parameter dictating the relative tolerance for convergence in the optimization; see optim.
maxit: a control parameter dictating the maximum number of iterations in the optimization; see optim.
tol.pql: a control parameter dictating the tolerance for convergence in the PQL estimates of the mean components of the binomial GLMM.
maxit.pql: a control parameter dictating the maximum number of iterations in the PQL estimates of the mean components of the binomial GLMM.
verbose: if TRUE, the model deviance and running estimates of s2 and B are plotted each iteration during optimization.
x: communityPGLMM object
re.number: which random.effect in x to be tested
...: additional arguments to summary and plotting functions (currently ignored)
ss: which of the random.effects to produce
object: communityPGLMM object to be summarised
digits: minimal number of significant digits for printing, as in print.default
show.plot: if TRUE (default), display plot

Author

Anthony R. Ives, cosmetic changes by Will Pearse

Details

The vignette 'pez-pglmm-overview' gives a gentle introduction to using PGLMMS. For linear mixed models (family = 'gaussian'), the function estimates parameters for the model of the form, for example,

$$Y = \beta_0 + \beta_1x + b_0 + b_1x$$ $$b_0 ~ Gaussian(0, \sigma_0^2I_{sp})$$ $$b_1 ~ Gaussian(0, \sigma_0^2V_{sp})$$ $$\eta ~ Gaussian(0,\sigma^2)$$

where $\beta_0$ and $\beta_1$ are fixed effects, and $V_{sp}$ is a variance-covariance matrix derived from a phylogeny (typically under the assumption of Brownian motion evolution). Here, the variation in the mean (intercept) for each species is given by the random effect $b_0$ that is assumed to be independent among species. Variation in species' responses to predictor variable $x$ is given by a random effect $b_0$ that is assumed to depend on the phylogenetic relatedness among species given by $V_{sp}$; if species are closely related, their specific responses to $x$ will be similar. This particular model would be specified as

re.1 <- list(1, sp = dat$sp, covar = diag(nspp)) re.2 <- list(dat$X, sp = dat$sp, covar = Vsp) z <- communityPGLMM(Y ~ X, data = data, family = "gaussian", random.effects = list(re.1, re.2))

The covariance matrix covar is standardized to have its determinant equal to 1. This in effect standardizes the interpretation of the scalar $\sigma^2$. Although mathematically this is not required, it is a very good idea to standardize the predictor (independent) variables to have mean 0 and variance 1. This will make the function more robust and improve the interpretation of the regression coefficients. For categorical (factor) predictor variables, you will need to construct 0-1 dummy variables, and these should not be standardized (for obvious reasons).

For binary generalized linear mixed models (family = 'binomial'), the function estimates parameters for the model of the form, for example,

$$y = \beta_0 + \beta_1x + b_0 + b_1x$$ $$Y = logit^{-1}(y)$$ $$b_0 ~ Gaussian(0, \sigma_0^2I_{sp})$$ $$b_1 ~ Gaussian(0, \sigma_0^2V_{sp})$$

where $\beta_0$ and $\beta_1$ are fixed effects, and $V_{sp}$ is a variance-covariance matrix derived from a phylogeny (typically under the assumption of Brownian motion evolution).

z <- communityPGLMM(Y ~ X, data = data, family = 'binomial', random.effects = list(re.1, re.2))

As with the linear mixed model, it is a very good idea to standardize the predictor (independent) variables to have mean 0 and variance 1. This will make the function more robust and improve the interpretation of the regression coefficients. For categorical (factor) predictor variables, you will need to construct 0-1 dummy variables, and these should not be standardized (for obvious reasons).

References

Ives, A. R. and M. R. Helmus. 2011. Generalized linear mixed models for phylogenetic analyses of community structure. Ecological Monographs 81:511-525.

Rafferty, N. E., and A. R. Ives. 2013. Phylogenetic trait-based analyses of ecological networks. Ecology 94:2321-2333.