multicut: Multi-level Response Model

Description

The functions fits the multi-level response model for each species, possibly controlling for modifying/confounding variables.

Usage

multicut1(Y, X, Z, dist = "gaussian", sset=NULL, ...)
multicut(...)
# S3 method for default
multicut(Y, X, strata, dist = "gaussian",
    sset=NULL, cl = NULL, ...)
# S3 method for formula
multicut(formula, data, strata, dist = "gaussian",
    sset=NULL, cl = NULL, ...)
# S3 method for multicut
bestmodel(object, which = NULL, ...)
# S3 method for multicut
bestpart(object, ...)
# S3 method for multicut
strata(object, ...)
# S3 method for multicut
getMLE(object, which, vcov=FALSE, ...)
# S3 method for multicut
subset(x, subset=NULL, ...)
# S3 method for multicut
fitted(object, ...)
# S3 method for multicut
predict(object, gnew=NULL, xnew=NULL, ...)
# S3 method for multicut
plot(x, which = NULL, cut, sort,
    las, ylab = "Relative abundance", xlab = "Strata",
    show_I = TRUE, show_S = TRUE, hr = TRUE, tick = TRUE,
    theme, mar = c(5, 4, 4, 4) + 0.1, bty = "o",
    lower = 0, upper = 1, pos = 0, horizontal=TRUE, ...)
# S3 method for multicut1
plot(x,
    ylab = "Relative abundance", xlab = "Strata", ...)
lcplot(x, ...)
# S3 method for multicut1
lcplot(x,
    ylab="Cumulative abundance", xlab="Strata",
    bty = "o", theme, ...)
# S3 method for multicut1
print(x, digits, ...)
# S3 method for multicut
print(x, digits, ...)
# S3 method for summary.multicut
print(x, cut, sort, digits, ...)
# S3 method for multicut
summary(object, ...)
# S3 method for multicut
as.data.frame(x,
    row.names = NULL, optional = FALSE, cut, sort, ...)
# S3 method for summary.multicut
as.data.frame(x,
    row.names = NULL, optional = FALSE, cut, sort, ...)

Value

multicut1 returns an object of class 'multicut1'.

multicut returns an object of class 'multicut', that is a list with the following components:

"call": the function call.
"species": a list of species specific multicut1 objects.
"X": modifying variables as model matrix.
"Y": response, single species vector or matrix.
"strata": defines the stratification.
"nobs": sample size.
"sset": subset, if specified.
"dist": distribution.
"failed": IDs for failed species models dropped from results list.

The strata method extracts the strata argument as factor.

The print and summary methods are called for their side effects showing expected values, and log likelihood ratio (logLR). Optimal binary partitions are determined as part of the summary based on Lorenz-tangent based thresholding, which requires nonnegative expected values. Indicator potential (I) is based on largest the contrast (difference) between the minimum and maximum estimates on the linear predictor (link) scale.

The subset method subsets the species in the multicut object.

The plot method presents the estimates by species and strata. The lcplot method plots the Lorenz curve for a single species 'multicut1' object.

bestpart returns a matrix with the best supported partitions for each species (samples and rows, species as columns). Binary partitions are based on Lorenz-tangent based optimal threshold (see lorenz). lorenz requires nonnegative fitted values which is not guaranteed under dist = "gaussian" with identity link, see fix_fitted

ocoptions setting for how to resolve this (choosing a different link function, distribution, or centering modified variables is advised).

bestmodel returns the best supported model for further manipulation (e.g. prediction). Note: custom distribution functions are designed to return only point estimates, thus the best model cannot be returned. In this case, use the best partition returned by bestpart to refit the model. getMLE returns a named list corresponding to the best supported model. The list has the following elements: coef is the Maximum Likelihood Estimate (MLE), vcov is the variance-covariance matrix for the MLE or NULL, dist is the distribution inherited from input object.

fitted returns expected values on the predictor scale for the observations as a matrix (number of observations by number of species). predict returns fitted values when both gnew

and xnew are NULL, or corresponding point predictions (expected values) on the predictor scale.

The coercion methods as.data.frame return a data frame.

Arguments

formula: two sided model formula, response species data (matrix, or possible a vector for single species case) in the left-hand side, model terms for modifying effects in the right-hand side (its structure depending on the underlying functions). For example, in the most basic Gaussian case it can be y ~ 1 (no modifying variables) or y ~ x (with modifying variables). Centering the modifying terms (or choosing the origin wisely) is generally recommended (especially for Gaussian distribution where linear predictors are additive on the response scale) because the relative abundance contrasts are estimated at the origin (0).
data: an optional data frame, list or environment containing the variables in the model. If not found in data, the variables are taken from parent.frame(), typically the environment from which multicut is called.
strata, Z: a factor, unique values define strata (must have at least 2 unique levels, empty levels are dropped).
dist: character or function, a distribution to fit. If character, it can follow one of these patterns: "family", or "family:link" when appropriate (there is a link argument in the underlying function, or the link can be specified via the family argument). See Details on opticut page and Examples.
sset: an optional vector specifying a subset of observations (rows) to be used in the fitting process. NULL means no subset taken.
cl: a cluster object, or an integer for multiple cores in parallel computations (integer value for forking is ignored on Windows).
Y: numeric vector of observations for multicut1, vector or community matrix for multicut.default.
X: numeric, design matrix for possible confounding/modifier variables. Can be missing, in which case an intercept-only model is assumed.
x, object: object to plot, print, summarize.
cut: log likelihood ratio value to be used as a cut-off for showing species whose log likelihood ratio is not less than the cut-off.
sort: logical value indicating if species/partitions should be meaningfully sorted, the default is TRUE. It can take numeric value when only species (1) or partitions (2) are to be sorted (1:2 is equivalent to TRUE).
show_I: logical, if indicator potential (I) should be shown.
show_S: logical, if number of indicator species should be shown.
hr, tick: logical, if horizontal rules (hr) and ticks to the axis legends (tick) should be added. Default is TRUE for both.
theme: color theme as defined by occolors.
mar: numeric, graphical parameters for plot margin par.
ylab, xlab, las: graphical arguments, see plot. By default, las is 1 when horizontal = TRUE and 2 when horizontal = FALSE.
bty: Character, determines the type of box which is drawn around plots, see par.
lower, upper: numeric (between 0 and 1), lower is the minimum and upper is the maximum height for rectangles drawn in the plot. Both need to be in [0, 1] and higher cannot be smaller than lower.
pos: numeric, position of rectangles in the plot relative to the baseline. Value must be in the [-1, 1] range (below vs. above baseline).
horizontal: logical, plot orientation: species as rows (TRUE) or as columns (FALSE).
digits: numeric, number of significant digits in output.
which: numeric or character (can be a vector) defining a subset of species from the fitted object, or NULL (all species, default).
row.names: NULL or a character vector giving the row names for the data frame. Missing values are not allowed. See as.data.frame.
optional: logical. If TRUE, setting row names and converting column names (to syntactic names: see make.names) is optional. See as.data.frame.
subset: logical, numeric, or character index indicating species to keep, missing values are not accepted. The default NULL returns the original object without subsetting.
vcov: logical, if variance-covariance matrix is to be returned.
gnew, xnew: new values for strata and modifiers (right-hand-side of formula) to predict for, or NULL.
...: other arguments passed to the underlying functions.

Warning

The use of the multicut1 function is generally discouraged: some of the internal checks are not guaranteed to flag issues when the formula-to-model-matrix translation is side-stepped (this is what is happening when the modifier variables are supplied as X argument in multicut1). Use the multicut function with a single species instead.

Author

Peter Solymos <psolymos@gmail.com>

Examples

Run this code

## --- Gaussian
## simple example from Legendre 2013
## Indicator Species: Computation, in
## Encyclopedia of Biodiversity, Volume 4
## https://dx.doi.org/10.1016/B978-0-12-384719-5.00430-5
gr <- as.factor(paste0("X", rep(1:5, each=5)))
spp <- cbind(Species1=rep(c(4,6,5,3,2), each=5),
    Species2=c(rep(c(8,4,6), each=5), 4,4,2, rep(0,7)),
    Species3=rep(c(18,2,0,0,0), each=5))
rownames(spp) <- gr
## must add some noise to avoid perfect fit
spp[6, "Species1"] <- 7
spp[1, "Species3"] <- 17
spp

## negative expected values are not good
oco <- ocoptions(fix_fitted=TRUE)
summary(ocall <- multicut(spp ~ 1, strata=gr, dist="gaussian"))
summary(multicut(spp, strata=gr, dist="gaussian")) # alternative
ocoptions(oco) # reset options

## --- Binomial
## simulated binary data
set.seed(1234)
n <- 200
x0 <- sample(1:4, n, TRUE)
x1 <- ifelse(x0 <= 2, 1, 0)
x2 <- rnorm(n, 0.5, 1)
p1 <- plogis(-0.5 + 2*x1 + -0.8*x2)
Y1 <- rbinom(n, 1, p1)
p2 <- plogis(-0.1 + 2*ifelse(x0==4,1,0) + -0.8*x2)
Y2 <- rbinom(n, 1, p2)
p3 <- plogis(-0.1 + -0.8*x2)
Y3 <- rbinom(n, 1, p3)
Y <- cbind(SPP1=Y1, SPP2=Y2, SPP3=Y3)
X <- model.matrix(~x2)

(m0 <- multicut1(Y1, X, as.factor(x0), dist="binomial"))
lcplot(m0)

summary(m1 <- multicut(Y ~ x2, strata=x0, dist="poisson"))
plot(m1)

## subset results
summary(subset(m1, 1:2))

## best partition
head(bestpart(m1))

## best model
mods <- bestmodel(m1)
mods
## explore further
confint(mods[[1]])

## MLE and variance-covariance matrix (species 1)
getMLE(m1, which = 1, vcov=TRUE)

## fitted values
head(fitted(m1))
## prediction for new data
head(predict(m1, gnew=x0, xnew=data.frame(x2=x2)))

if (FALSE) {
## --- Zero-inflated Negative Binomial
## dolina example
data(dolina)
## stratum as ordinal
dolina$samp$stratum <- as.integer(dolina$samp$stratum)
## filter species to speed up things a bit
Y <- dolina$xtab[,colSums(dolina$xtab > 0) >= 20]
## opticut results, note the cloglog link function
dol <- multicut(Y ~ stratum + lmoist + method, data=dolina$samp,
    strata=dolina$samp$mhab, dist="zinb:cloglog")
summary(dol)
## vertical plot orientation
plot(dol, horizontal=FALSE, pos=1, upper=0.8)

## parallel
library(parallel)
cl <- makeCluster(2)
multicut(Y ~ stratum + lmoist + method, data=dolina$samp,
    strata=dolina$samp$mhab, dist="zip",cl=cl)
stopCluster(cl)

## --- Customizing distributions
## we may want to expand the Zero-inflation component in a ZIP model
## see how the return value needs to be structured
fun <- function(Y, X, linkinv, zi_term, ...) {
    X <- as.matrix(X)
    mod <- pscl::zeroinfl(Y ~ X-1 | zi_term, dist = "poisson", ...)
    list(coef=coef(mod),
        logLik=logLik(mod),
        linkinv=mod$linkinv)
}
Xdol <- model.matrix(~ stratum + lmoist + method, data=dolina$samp)
## this fits the null model (i.e. no partitions added)
fun(Y[,"amin"], Xdol, zi_term=dolina$samp$method)
## now we can use dist=fun
multicut1(Y[,"amin"], Xdol, Z=dolina$samp$mhab,
    dist=fun, zi_term=dolina$samp$method)
dol2 <- multicut(Y ~ stratum + lmoist + method, data=dolina$samp,
    strata=dolina$samp$mhab, dist=fun, zi_term=dolina$samp$method)
summary(dol2)
}

Run the code above in your browser using DataLab