matrix.p.sig: Association between phylogeny-weighted species composition and environmental predictors

Description

Analyses to relate an environmental gradient to the phylogenetic assembly of species across a metacommunity by means of phylogenetic fuzzy weighting.

Usage

matrix.p.sig(
  comm,
  phylodist,
  envir,
  checkdata = TRUE,
  FUN,
  runs = 999,
  parallel = NULL,
  newname = "pcps",
  ...
)
pcps.sig(
  comm,
  phylodist,
  envir,
  checkdata = TRUE,
  method = "bray",
  squareroot = TRUE,
  FUN,
  choices,
  runs = 999,
  parallel = NULL,
  newname = "pcps",
  ...
)
FUN.ADONIS(x, envir, method.p, sqrt.p = TRUE, formula, return.model = FALSE)
FUN.GLM(x, envir, formula, ..., return.model = FALSE)
FUN.GLS.marginal(x, envir, formula, ..., return.model = FALSE)
FUN.GLS.sequential(x, envir, formula, ..., return.model = FALSE)
FUN.LME.marginal(x, envir, formula, ..., return.model = FALSE)
FUN.LME.sequential(x, envir, formula, ..., return.model = FALSE)
FUN.MANTEL(
  x,
  envir,
  method.p,
  method.envir,
  sqrt.p = TRUE,
  ...,
  return.model = FALSE
)
FUN.RDA(x, envir, return.model = FALSE)
# S3 method for pcpssig
print(x, ...)

Arguments

comm

Community data, with species as columns and sampling units as rows. This matrix can contain either presence/absence or abundance data. Alternatively comm can be an object of class metacommunity.data, an alternative way to set all data.frames/matrices. When you use the class metacommunity.data the arguments phylodist and envir must not be specified. See details.

phylodist

Matrix containing phylogenetic distances between species.

envir

A matrix or data.frame with environmental variables for each community, with variables as columns and sampling units as rows. See Details and Examples.

checkdata

Logical argument (TRUE or FALSE) to check if species sequence in the community data follows the same order as the one in the phylodist matrix and if sampling units in the community data follows the same order as the one in the environmental data (Default checkdata = TRUE).

FUN

An object of class function to perform the analysis. See Details and Examples.

runs

Number of permutations for assessing significance.

parallel

Number of parallel processes or a predefined socket cluster done with parallel package. Tip: use detectCores() (Default parallel = NULL).

newname

New name to be replaced in object returned by matrix.p.null (Default newname = "pcps").

...

Other arguments passed to FUN function. See Details and Examples.

method

Dissimilarity index, as accepted by vegdist (Default dist = "bray").

squareroot

Logical argument (TRUE or FALSE) to specify if use square root of dissimilarity index (Default squareroot = TRUE).

choices

Numeric vector to choose the PCPS used in analysis. See Details and Examples.

An object of class pcpssig or other object to apply the function passed by FUN. See Details.

method.p

Resemblance index between communities based on P matrix, as accepted by vegdist. Used in FUN.MANTEL, FUN.ADONIS, FUN.ADONIS2.global and FUN.ADONIS2.margin analysis. See Details and Examples.

sqrt.p

Logical argument (TRUE or FALSE) to specify if use square root of dissimilarity P matrix. Used in FUN.MANTEL, FUN.ADONIS, FUN.ADONIS2.global and FUN.ADONIS2.margin analysis. See Details and Examples (Default sqrt.p = TRUE).

formula

An object of class formula. Used in FUN.GLM, FUN.ADONIS, FUN.ADONIS2.global, FUN.ADONIS2.margin, FUN.GLS.marginal, FUN.GLS.sequential, FUN.LME.marginal and FUN.LME.sequential analysis. See Details and Examples.

return.model

Must not be specified. See Details.

method.envir

Resemblance index between communities based on environmental variables, as accepted by vegdist. Used in FUN.MANTEL analysis. See Details and Examples.

Value

call

The arguments used.

P.obs

Phylogeny-weighted species composition matrix.

PCPS.obs

The principal coordinates of phylogenetic structure (PCPS)

model

The observed model returned by FUN, an object of class glm, gls, lme, rda, adonis, adonis2 or mantel to predefined function.

fun

The funtion used.

statistic.null.site

A matrix with null statistic for site shuffle null model.

statistic.null.taxa

A matrix with null statistic for taxa shuffle null model.

obs.statistic

Observed statistic, F value or r value to predefined function.

p.site.shuffle

The p value for the site shuffle null model.

p.taxa.shuffle

The p value for the taxa shuffle null model.

Details

Each metacommunity is submitted to phylogenetic fuzzy weighting, generating a matrix that describing the phylogeny-weighted species composition of the communities (matrix.p). The function matrix.p.sig test directly the association this matrix with the environmental predictors. The pairwise dissimilarities are submitted to Mantel test (mantel) or ADONIS test (adonis or adonis2) to evaluate the influence of an environmental gradient on species dispersion across the communities. The function pcps.sig generates principal coordinates of phylogenetic structure (pcps) and use a single axis for run a generalized linear model (GLM, glm), linear model using generalized least squares (GLS, gls), linear mixed-effects models (LME, lme) or use set of axis for run a distance-based redundancy analysis (db-RDA, rda).

The sequence species show up in the community data matrix must be the same as they show up in the phylogenetic distance matrix and, similarly, the sequence of communities in the community data matrix must be the same as that in the environmental data. The function organize.pcps organizes the data, placing the matrices of community, phylogenetic distance and environmental data in the same order. The function use of function organize.pcps is not requered for run the functions, but is recommended. In this way the arguments comm and phylodist can be specified them as normal arguments or by passing them with the object returned by the function organize.pcps using, in this case only the argument comm. Using the object returned by organize.pcps, the comm argument is used as an alternative way of entering to set all data.frames/matrices, and therefore the arguments phylodist and envir must not be specified.

The significance is obtained via two null models, one that shuffles sites across the environmental gradient and another that shuffles terminal tips (taxa) across the phylogenetic tree. The first null model (site shuffle) shuffles the site position across the environmental gradient and rerun the same model, generating a null F value (or r value in Mantel test). The second null model (taxa shuffle), shuffles terminal tips across the phylogenetic tree and generates a null matrix containing phylogeny-weighted species composition and rerun the same model, generating another null F value. In the pcps.sig function are generate set of null PCPS and each null PCPS (or set of PCPS in RDA) is submitted to a procrustean adjustment (see procrustes), and the fitted values between observed PCPS and null PCPS is obtained. The adjusted null PCPS is used to rerun the model, generating another null F value. The observed F value (or r value) is compared independently with both null sets of F values (or r value) to generate a probability value of the original F value being generated merely by chance according to each null model.

The argument FUN

The type of analysis performed by this function is specified using the argument FUN. The current version of package includes ten predefined function, however additional small functions can be easy specify. All this function uses the environmental variables to analyze the association between phylogeny-weighted species composition and environmental predictors. For matrix P analysis, in matrix.p.sig function, the predefined functions available are FUN.MANTEL, FUN.ADONIS, FUN.ADONIS2.global and FUN.ADONIS2.margin. For PCPS analysis, in pcps.sig function, the predefined functions available are FUN.GLM, FUN.RDA, FUN.GLS.marginal, FUN.GLS.sequential, FUN.LME.marginal and FUN.LME.sequential. The significance for each null model is performed as described here, NOT using p value of basic functions.

FUN.MANTEL

Mantel test that can be used in matrix P analysis. The arguments method.p and sqrt.p are specified for determine resemblance index between communities based on P matrix. The argument method.envir is specified to determine resemblance index between communities based on environmental variables. The significance is assess using r value, see more in mantel.

FUN.ADONIS

FUN.ADONIS2.global and FUN.ADONIS2.margin

Multivariate analysis of variance that can be used in matrix P analysis. The arguments method.p and sqrt.p are specified for determine resemblance index between communities based on P matrix. The argument formula is specified, where the left hand side gives the resemblance data, right hand side gives the variables. The resemblance data is internally named p.dist, thus formula is an expression of the form p.dist ~ model (see Examples). The significance is assess using F value and the difference between function is due to the argument by in adonis2. The function FUN.ADONIS2.global use as default by = NULL to assess the overall significance of all terms together whereas the function FUN.ADONIS2.margin use as default by = margin to assess the marginal effects of the terms and return F and p value for each term. See more in adonis2.

The function adonis2 evaluate the formula argument in the global environment, however CRAN do not allow assignments to the global environment. As a temporary workaround, copy and run the lines below to make the functions FUN.ADONIS2.global and FUN.ADONIS2.margin available.


FUN.ADONIS2.global <- function(x, envir, method.p, formula, sqrt.p = TRUE, return.model = FALSE){
p.dist <- vegan::vegdist(x, method = method.p)
if(sqrt.p){
  p.dist <- sqrt(p.dist)
}
assign("p.dist", p.dist, envir = globalenv())
mod.obs <- vegan::adonis2(formula, data = data.frame(envir), permutations = 0, by = NULL, parallel = NULL)
rm(p.dist, envir = globalenv())
statistic.obs <- mod.obs$F[1]
if(return.model){
  res <- list()
  res$mod.obs <- mod.obs
  res$statistic.obs <- statistic.obs
} else{
  res <- statistic.obs
}
return(res)
}
FUN.ADONIS2.margin <- function(x, envir, method.p, formula, sqrt.p = TRUE, return.model = FALSE){
p.dist <- vegan::vegdist(x, method = method.p)
if(sqrt.p){
  p.dist <- sqrt(p.dist)
}
assign("p.dist", p.dist, envir = globalenv())
mod.obs <- vegan::adonis2(formula, data = data.frame(envir), permutations = 2, by = "margin", parallel = NULL)
rm(p.dist, envir = globalenv())
nf <- length(mod.obs$F)-2
statistic.obs <- mod.obs$F[seq_len(nf)]
if(return.model){
  res <- list()
  res$mod.obs <- mod.obs
  res$statistic.obs <- statistic.obs
} else{
  res <- statistic.obs
}
return(res)
}

FUN.GLM

Generalized linear models that can be used in PCPS analysis. The argument formula is specified, where the left hand side gives the PCPS used, right hand side gives the variables. The PCPS are internally named sequentially pcps.1, pcps.2, pcps.3 and so on. Thus, formula is an expression of the form pcps.1 ~ model (see Examples). The type of environmental variables are extracted directly from envir argument, thus variables of class factor can be already specified in envir data.frame or through formula argument. The significance is assess using overall F value, see more in glm.

FUN.RDA

Redundancy analysis that can be used in PCPS analysis. The RDA analysis is performed using all PCPS specified with choices argument and all environmental variables specified by envir argument. The significance is assess using overall F value, see more in rda.

FUN.GLS.marginal and FUN.GLS.sequential

Linear model using generalized least squares that can be used in PCPS analysis. The argument formula is specified, where the left hand side gives the PCPS used, right hand side gives the variables. The PCPS are internally named sequentially pcps.1, pcps.2, pcps.3 and so on. Thus, formula is an expression of the form pcps.1 ~ model (see Examples). The type of environmental variables are extracted directly from envir argument, thus variables of class factor can be already specified in envir data.frame or through formula argument. The significance is assess using F value and the difference between function is due to the argument type in anova.gls. The function FUN.GLS.marginal use as default type = marginal to assess the marginal significance of all terms whereas the function FUN.GSL.sequential use as default type = sequential to assess the sequential effects of the terms. Those funcitons return all F values calculed by anova.gls, including the intercept if it is in the model. Additional arguments as correlation can be passed by ... argument. See more in gls and anova.gls.

FUN.LME.marginal and FUN.LME.sequential

Linear mixed-effects models that can be used in PCPS analysis. The argument formula is specified, where the left hand side gives the PCPS used, right hand side gives the variables. The PCPS are internally named sequentially pcps.1, pcps.2, pcps.3 and so on. Thus, formula is an expression of the form pcps.1 ~ model (see Examples). The type of environmental variables are extracted directly from envir argument, thus variables of class factor can be already specified in envir data.frame or through formula argument. The significance is assess using F value and the difference between function is due to the argument type in anova.lme. The function FUN.LME.marginal use as default type = marginal to assess the marginal significance of all terms whereas the function FUN.LME.sequential use as default type = sequential to assess the sequential effects of the terms. Those funcitons return all F values calculed by anova.lme, including the intercept if it is in the model. Additional arguments as correlation and random can be passed by ... argument. See more in lme and anova.lme.

Additional function

The functions matrix.p.sig and pcps.sig only perform permutation following null models and apply the functions in all permuted matrices. Additional functions can be easy specify and passed via FUN argument. A skeleton of this function is slowed below. In this function the argument x will be always the matrix P or one matrix with PCPS choose, when additional arguments as envir will specify statistical analysis performed in matrix P ou PCPS. This function must return the observed statistical in addition the return.model argument must not be specified because it specify the return options used for observed and null statistics.

FUN.X <- function(x, envir, ..., return.model = FALSE){
  mod.obs <- # Function to perform analysis using x, envir and any additional argument
  statistic.obs <- # Extract only the numeric values of observed statistical
  # Next lines are mandatory
   if(return.model){
      res <- list()
      res$mod.obs <- mod.obs
      res$statistic.obs <- statistic.obs
    } else{
      res <- statistic.obs
    }
  return(res) 
}

References

Duarte, L.S. (2011). Phylogenetic habitat filtering influences forest nucleation in grasslands. Oikos, 120, 208:215.

Duarte, L.S. (2016). Dissecting phylogenetic fuzzy weighting: theory and application in metacommunity phylogenetics. Methods in Ecology and Evolution, 7(8), 937:946.

Examples

Run this code

# NOT RUN {
# }
# NOT RUN {
data(flona)

# MANTEL
res <- matrix.p.sig(flona$community,flona$phylo, FUN = FUN.MANTEL, method.p = "bray", 
             method.envir = "euclidean", envir = flona$environment[, 2, drop = FALSE], runs = 99)
res

# ADONIS
res <- matrix.p.sig(flona$community,flona$phylo, FUN = FUN.ADONIS, method.p = "bray", 
             formula = p.dist~temp, envir = flona$environment[, 2, drop = FALSE], runs = 99)
res

# ADONIS2
res <- matrix.p.sig(flona$community,flona$phylo, FUN = FUN.ADONIS2.global, 
             envir = flona$environment, formula = p.dist~temp+alt, 
             method.p = "bray", runs = 99)
res            
res <- matrix.p.sig(flona$community,flona$phylo, FUN = FUN.ADONIS2.margin, 
              envir = flona$environment, formula = p.dist~temp+alt, 
              method.p = "bray", runs = 99)
res            

# GLM
res <- pcps.sig(flona$community, flona$phylo, FUN = FUN.GLM, method = "bray", 
         formula = pcps.1~temp, envir = flona$environment, choices = 1, runs = 99)
res
summary.lm(res$model)

# RDA
res <- pcps.sig(flona$community, flona$phylo, FUN = FUN.RDA, envir = flona$environment, 
         choices = 1:2, runs = 99)
res

# GLS
res <- pcps.sig(flona$community, flona$phylo, FUN = FUN.GLS.marginal, 
         formula = pcps.1~temp, envir = flona$environment, choices = 1, runs = 99)
res
anova(res$model, type = "marginal")

res <- pcps.sig(flona$community, flona$phylo, FUN = FUN.GLS.marginal, 
         formula = pcps.1~temp, envir = flona$environment, 
         correlation = nlme::corCAR1(form = ~1:39), choices = 1, runs = 99)
res
anova(res$model, type = "marginal")

# LME
res <- pcps.sig(flona$community, flona$phylo, FUN = FUN.LME.marginal, formula = pcps.1~alt, 
         envir = flona$environment, random = ~1|temp, choices = 1, runs = 99)
res
anova(res$model, type = "marginal")

res <- pcps.sig(flona$community, flona$phylo, FUN = FUN.LME.sequential, formula = pcps.1~alt,
         envir = flona$environment, random = ~1|temp, choices = 1, runs = 99)
res
anova(res$model, type = "sequential")
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab