gjam: Gibbs sampler for gjam data

Description

Analyzes joint attribute data (e.g., species abundance) with Gibbs sampling. Input can be output from gjamSimData. Returns a list of objects from Gibbs sampling that can be plotted by gjamPlot.

Usage

gjam(formula, xdata, ydata, modelList)
  
  # S3 method for gjam
print(x, ...)
  
  # S3 method for gjam
summary(object, ...)

Arguments

formula

R formula for model, e.g., ~ x1 + x2.

xdata

data.frame containing predictors in formula. If not found in xdata variables, they must be available from the user's workspace.

ydata

n by S response matrix or data.frame. Column names are unique labels, e.g., species names. All columns will be included in the analysis.

modelList

list specifying inputs, including ng (number of Gibbs steps), burnin, and typeNames. Can include the number of holdouts for out-of-sample prediction, holdoutN. See Details.

object of class gjam.

object

currently, also an object of class gjam.

...

further arguments not used here.

Value

Returns an object of class "gjam", which is a list containing the following components:

call

function call

chains

list of MCMC matrices, each with ng rows; includes coefficients bgibbs(Q*S columns), bgibbsUn (unstandardized for x), sensitivity fgibbs (Q1 columns), and fbgibbs (Q1 columns, where Q1 = Q - 1, unless there are multilevel factors); covariance sgibbs has S*(S + 1)/2 columns (REDUCT == F) or N*r columns (REDUCT == T).

fit

list of diagnostics (DIC, rmspeAll, rmspeBySpec, xscore, yscore).

inputs

list of input summaries, including breakMat (partition matrix), classBySpec (interval assignment), designTable (summary of design matrix), [factorBeta, interBeta, intMat, linFactor] (factor and interaction information), other, notOther (response columns to exclude and not), [standMat, standRows, standX] means and variances to standardize x, [x, xdata, y] cleaned versions of data.

missing

list of missing objects, including locations for predictors xmiss and responses ymiss in xdata and ydata, respectively, predictor means xmissMu and standard errors xmissSe, response means ymissMu and standard errors ymissSe .

modelList

list of model specifications from input modelList.

parameters

list of parameter estimates, including coefficient matrices on standardized (betaMu, betaSe), unstandardized (betaMuUn, betaSeUn), and dimensionless (fBetaMu, fBetaSd) scales; correlation (corMu, corSe) and covariance (sigMu, sigSe) matrices; sensitivities to predictors (fmatrix, fMu, fSe); environmental response matrix (ematrix), with locations of zero elements, conditionally (whConZero) and marginally (whichZero), set at probability level modelList$ematAlpha); and latent variables (wMu, wSd).

prediction

list of predicted values, including species richness (responses predicted > 0); inverse predicted x (xpredMu, xpredSd) and predicted y (ypredMu, ypredSd) matrices.

If traits are modeled, then parameters will additionally include betaTraitMu, betaTraitSe (coefficients), sigmaTraitMu, sigmaTraitSe (covariance). prediction will additionally include tMuOrd (ordinal trait means), tMu, tSe (trait predictions).

Details

Note that formula begins with ~, not y ~. The response matrix is passed in the form of a n by S matrix or data.frame ydata.

Both predictors in xdata and responses in ydata can include missing values as NA. Factors in xdata should be declared using factor. For computational stability variables that are not factors are standardized by mean and variance, then transformed back to original scales in output. To retain a variable in its original scale during computation include it in the character string notStandard as part of the list modelList. (example shown in the vignette on traits).

modelList has these defaults and provides these options:

ng = 2000, number of Gibbs steps.

burnin = 500, no. initial steps, must be less than ng.

typeNames can be 'PA' (presenceAbsence), 'CON' (continuous on (-Inf, Inf)), 'CA' (continuous abundance, zero censoring), 'DA' (discrete abundance), 'FC' (fractional composition), 'CC' (count composition), 'OC' (ordinal counts), 'CAT' (categorical classes). typeNames can be a single value that applies to all columns in ydata, or there can be one value for each column.

holdoutN = 0, number of observations to hold out for out-of-sample prediction.

holdoutIndex = numeric(0), numeric vector of observations (row numbers) to holdout for out-of-sample prediction.

censor = NULL, list specifying columns, values, and intervals for censoring, see gjamCensorY.

effort = NULL, list containing 'columns', a vector of length <= S giving the names of columns in in y, and 'values', a length-n vector of effort or a n by S matrix (see Examples). effort can be plot area, search time, etc. for discrete count data 'DA'.

FULL = F in modelList will save full prediction chains in $chains$ygibbs.

notStandard = NULL, character vector of column names in xdata that should not be standardized.

reductList = list(N = 20, r = 3), list of dimension reduction parameters, invoked when reductList is included in modelList or automatically when ydata has too many columns. See vignette on Dimension Reduction.

random, character string giving the name of a column in xdata that will be used to specify random effects. The random group column should be declared as a factor. There should be replication, i.e., each group level occurs multiple times.

REDUCT = F in modelList overrides automatic dimension reduction.

FCgroups, CCgroups, are length-S vectors assigned to columns in ydata indicating composition 'FC' or 'CC' group membership. For example, if there are two 'CA' columns in ydata followed by two groups of fractional composition data, each having three columns, then typeNames = c('CA','CA','FC','FC','FC','FC','FC','FC') and FCgroups = c(0,0,1,1,1,2,2,2). note: gjamSimData is not currently set up to simulate multiple composition groups, but gjam will model it.

PREDICTX = T executes inverse prediction of x. Speed-up by setting PREDICTX = F.

ematAlpha = .5 is the probability assigned for conditional and marginal independence in the ematrix.

traitList = list(plotByTrait, traitTypes, specByTrait), list of trait objects. See vignette on Trait analysis.

More detailed vignettes can be obtained with:

browseVignettes('gjam')

References

Clark, J.S., D. Nemergut, B. Seyednasrollah, P. Turner, and S. Zhang. 2017. Generalized joint attribute modeling for biodiversity analysis: Median-zero, multivariate, multifarious data. Ecological Monographs, 87, 34-56.

Examples

Run this code

# NOT RUN {
## combinations of scales
types <- c('DA','DA','OC','OC','OC','OC','CON','CON','CON','CON','CON','CA','CA','PA','PA')         
f    <- gjamSimData(S = length(types), typeNames = types)
ml   <- list(ng = 500, burnin = 50, typeNames = f$typeNames)
out  <- gjam(f$formula, f$xdata, f$ydata, modelList = ml)
summary(out)

# repeat with ng = 5000, burnin = 500, then plot data:
pl  <- list(trueValues = f$trueValues)
gjamPlot(out, plotPars = pl)

## discrete abundance with heterogeneous effort 
S   <- 5                             
n   <- 1000
eff <- list( columns = 1:S, values = round(runif(n,.5,5),1) )
f   <- gjamSimData(n, S, typeNames='DA', effort=eff)
ml  <- list(ng = 2000, burnin = 500, typeNames = f$typeNames, effort = eff)
out <- gjam(f$formula, f$xdata, f$ydata, modelList = ml)
summary(out)

# repeat with ng = 2000, burnin = 500, then plot data:
pl  <- list(trueValues = f$trueValues)
gjamPlot(out, plotPars = pl)
# }

Run the code above in your browser using DataLab