Zeta.msgdm: Multi-site generalised dissimilarity modelling for a set of environmental variables and distances

Description

Computes a regression model of zeta diversity for a given order (number of assemblages or sites) against a set of environmental variables and distances between sites. The different regression models available are generalised linear models, generalised linear models with negative constraints, generalised additive models, shape constrained additive models, and I-splines.

Usage

Zeta.msgdm(
  data.spec,
  data.env,
  xy = NULL,
  data.spec.pred = NULL,
  order = 1,
  sam = 1000,
  reg.type = "glm",
  family = stats::gaussian(),
  method.glm = "glm.fit.cons",
  cons = -1,
  cons.inter = 1,
  confint.level = 0.95,
  bs = "mpd",
  kn = -1,
  order.ispline = 2,
  kn.ispline = 1,
  distance.type = "Euclidean",
  dist.custom = NULL,
  rescale = FALSE,
  rescale.pred = TRUE,
  method = "mean",
  normalize = FALSE,
  silent = FALSE,
  empty.row = 0,
  control = list(),
  glm.init = FALSE
)

Arguments

data.spec

Site-by-species presence-absence data frame, with sites as rows and species as columns.

data.env

Site-by-variable data frame, with sites as rows and environmental variables as columns.

Site coordinates, to account for distances between sites.

data.spec.pred

Site-by-species presence-absence data frame or list of data frames, with sites as rows and species as columns, for which zeta diversity will be computed and used as a predictor of the zeta diversity of data.spec.

order

Specific number of assemblages or sites at which zeta diversity is computed.

sam

Number of samples for which the zeta diversity is computed.

reg.type

Type of regression used in the multi-site generalised dissimilarity modelling. Options are "glm" for generalised linear models, "ngls" for negative linear models, "gam" for generalised additive models, "scam" for shape constrained additive models (with monotonic decreasing by default), and "ispline" for I-spline models (forcing monotonic decline), as recommended in generalised dissimilarity modelling by Ferrier et al. (2007).

family

A description of the error distribution and link function to be used in the glm, gam and scam models (see family for details of family functions).

method.glm

Method used in fitting the generalised linear model. The default method "glm.fit.cons" is an adaptation of method glm.fit2 from package glm2 using a constrained least squares regression (default is negative coefficients) in the reweighted least squares. Another option is "glm.fit2", which calls glm.fit2; see help documentation for glm.fit2 in package glm2.

cons

type of constraint in the glm if method.glm = "glm.fit.cons". Default is -1 for negative coefficients on the predictors. The other option is 1 for positive coefficients on the predictors.

cons.inter

type of constraint for the intercept. Default is 1 for positive intercept, suitable for Gaussian family. The other option is -1 for negative intercept, suitable for binomial family.

confint.level

Percentage for the confidence intervals of the coefficients from the generalised linear models.

A two-letter character string indicating the (penalized) smoothing basis to use in the scam model. Default is "mpd" for monotonic decreasing splines. see smooth.terms for an overview of what is available.

Number of knots in the GAM and SCAM. Default is -1 for determining kn automatically using Generalized Cross-validation.

order.ispline

Order of the I-spline.

kn.ispline

Number of knots in the I-spline.

distance.type

Method to compute distance. Default is "Euclidean", for Euclidean distance. The other options are (i) "ortho" for orthodromic distance, if xy correspond to longitudes and latitudes (orthodromic distance is computed using the geodist function from package geodist); and (ii) "custom", in which case the user must provide a distance matrix for dist.custom.

dist.custom

Distance matrix provided by the user when distance.type = "custom".

rescale

Boolean value (TRUE or FALSE) indicating if the zeta values should be divided by the total number of species in the dataset, to get a range of values between 0 and 1. Has no effect if normalize != FALSE.

rescale.pred

Boolean value (TRUE or FALSE) indicating if the spatial distances and differences in environmental variables should be rescaled between 0 and 1.

method

Name of a function (as a string) indicating how to combine the pairwise differences and distances for more than 3 sites. It can be a basic R-function such as "mean" or "max", but also a custom function.

normalize

Indicates if the zeta values for each sample should be divided by the total number of species for this specific sample (normalize = "Jaccard"), by the average number of species per site for this specific sample (normalize = "Sorensen"), or by the minimum number of species in the sites of this specific sample (normalize = "Simpson"). Default value is FALSE, indicating that no normalization is performed.

silent

Boolean value (TRUE or FALSE) indicating if warnings must be printed.

empty.row

Determines how to handle empty rows, i.e. sites with no species. Such sites can cause underestimations of zeta diversity, and computation errors for the normalized version of zeta due to divisions by 0. Options are "empty" to let the data untreated, "remove" to remove the empty rows, 0 to set the normalized zeta to 0 when zeta is divided by 0 during normalization (sites share no species, so are completely dissimilar), and 1 to set the normalized zeta to 1 when zeta is divided by 0 during normalization (i.e. sites are perfectly similar).

control

As for glm.

glm.init

Boolean value, indicating if the initial parameters for fitting the glm with constraint on the coefficients signs for reg.type == "ispline" should be initialised based on the correlation coefficients betwen the zeta values and the environmental difference or distance. glm.init = TRUE helps preventing the error message: error: cannot find valid starting values: please specify some.

Value

Zeta.msgdm returns a list whose component vary depending on the regression technique. The list can contain the following components:

val

Vector of zeta values used in the MS-GDM.

predictors

Data frame of the predictors used in the MS-GDM.

range.min

Vector containing the minimum values of the numeric variables, used for rescaling the variables between 0 and 1 for I-splines (see Details).

range.max

Vector containing the maximum values of the numeric variables, used for rescaling the variables between 0 and 1 for I-splines (see Details).

rescale.factor

Factor by which the predictors were divided if rescale.pred = TRUE and order>1.

order.ispline

The value of the original parameter, to be used in Plot.ispline.

kn.ispline

The value of the original parameter, to be used in Plot.ispline.

model

An object whose class depends on the type of regression (glm, nnnpls, gam or scam; I-splines return and object of class glm), corresponding to the regression over distance for the number of assemblages or sites specified in order.

confint

The confidence intervals for the coefficients from generalised linear models with no constraint. confint is not generated for the other types of regression.

vif

The variance inflation factors for all the variables for the generalised linear regression. vif is not generated for the other types of regression.

Details

The environmental variables can be numeric or factorial.

If order = 1, the variables are used as such in the regression, and factorial variables must be dummy for the output of the regression to be interpretable.

For numeric variables, if order>1 the pairwise difference between sites is computed and combined according to method. For factorial variables, the distance corresponds to the number of unique values over the number of assemblages of sites specified by order.

If xy = NULL, Zeta.msgdm only uses environmental variables in the regression. Otherwise, it also computes and uses euclidian distance (average or maximum distance between multiple sites, depending on the parameters method) as an explanatory variable.

If rescale.pred = TRUE, zeta is regressed against the differences of values of the environmental variables divided by the maximum difference for each variable, to be rescaled between 0 and 1. If !is.null(xy), distances between sites are also divided by the maximum distance. If order = 1, the variables are transformed by first subtracting their minimum value, and dividing by the difference of their maximum and minimum values.

If reg.type = "ispline", the variables are rescaled between 0 and 1 prior to computing the I-splines by subtracting their minimum value, and dividing by the difference of their maximum and minimum values.

References

Hui C. & McGeoch M.A. (2014). Zeta diversity as a concept and metric that unifies incidence-based biodiversity patterns. The American Naturalist, 184, 684-694.

Ferrier, S., Manion, G., Elith, J., & Richardson, K. (2007). Using generalized dissimilarity modelling to analyse and predict patterns of beta diversity in regional biodiversity assessment. Diversity and Distributions, 13(3), 252-264.

Examples

Run this code

# NOT RUN {
utils::data(bird.spec.coarse)
xy.bird <- bird.spec.coarse[1:2]
data.spec.bird <- bird.spec.coarse[3:193]
utils::data(bird.env.coarse)
data.env.bird <- bird.env.coarse[,3:9]

zeta.glm <- Zeta.msgdm(data.spec.bird, data.env.bird, sam = 100, order = 3)
zeta.glm
dev.new()
graphics::plot(zeta.glm$model)

zeta.ngls <- Zeta.msgdm(data.spec.bird, data.env.bird, xy.bird, sam = 100, order = 3,
    reg.type = "ngls", rescale = TRUE)
zeta.ngls

##########

utils::data(Marion.species)
xy.marion <- Marion.species[1:2]
data.spec.marion <- Marion.species[3:33]
utils::data(Marion.env)
data.env.marion <- Marion.env[3]

zeta.gam <- Zeta.msgdm(data.spec.marion, data.env.marion, sam = 100, order = 3,
    reg.type = "gam")
zeta.gam
dev.new()
graphics::plot(zeta.gam$model)

zeta.ispline <- Zeta.msgdm(data.spec.marion, data.env.marion, xy.marion, sam = 100,
    order = 3, normalize = "Jaccard", reg.type = "ispline")
zeta.ispline

zeta.ispline.r <- Return.ispline(zeta.ispline, data.env.marion, distance = TRUE)
zeta.ispline.r

dev.new()
Plot.ispline(isplines = zeta.ispline.r, distance = TRUE)

dev.new()
Plot.ispline(msgdm = zeta.ispline, data.env = data.env.marion, distance = TRUE)

# }

Run the code above in your browser using DataLab