Analyzes joint attribute data (e.g., species abundance) with Gibbs sampling.
Input can be output from gjamSimData
. Returns a list of objects from Gibbs sampling that can be plotted by gjamPlot
.
gjam(formula, xdata, ydata, modelList)
# S3 method for gjam
print(x, ...)
# S3 method for gjam
summary(object, ...)
R formula for model, e.g., ~ x1 + x2
.
data.frame
containing predictors in formula
. If not found in xdata
variables, they must be available from the user's workspace.
n
by S
response matrix
or data.frame
. Column names are unique labels, e.g., species names. All columns will be included in the analysis.
list
specifying inputs, including ng
(number of Gibbs steps), burnin
, and typeNames
. Can include the number of holdouts for out-of-sample prediction, holdoutN
. See Details.
object of class gjam
.
currently, also an object of class gjam
.
further arguments not used here.
Returns an object of class "gjam"
, which is a list containing the following components:
function call
list
of MCMC matrices, each with ng
rows; includes coefficients bgibbs
(Q*S
columns), bgibbsUn
(unstandardized for x
), sensitivity fgibbs
(Q1
columns), and fbgibbs
(Q1
columns, where Q1 = Q - 1
, unless there are multilevel factors); covariance sgibbs
has S*(S + 1)/2
columns (REDUCT == F
) or N*r
columns (REDUCT == T
).
list
of diagnostics (DIC, rmspeAll, rmspeBySpec, xscore, yscore
).
list
of input summaries, including breakMat
(partition matrix), classBySpec
(interval assignment), designTable
(summary of design matrix), [factorBeta, interBeta, intMat, linFactor
] (factor and interaction information), other, notOther
(response columns to exclude and not), [standMat, standRows, standX
] means and variances to standardize x
, [x, xdata, y
] cleaned versions of data.
list
of missing objects, including locations for predictors xmiss
and responses ymiss
in xdata
and ydata
, respectively, predictor means xmissMu
and standard errors xmissSe
, response means ymissMu
and standard errors ymissSe
.
list
of model specifications from input modelList
.
list
of parameter estimates, including coefficient matrices on standardized (betaMu, betaSe
), unstandardized (betaMuUn, betaSeUn
), and dimensionless (fBetaMu, fBetaSd
) scales; correlation (corMu, corSe
) and covariance (sigMu, sigSe
) matrices; sensitivities to predictors (fmatrix, fMu, fSe
); environmental response matrix (ematrix
), with locations of zero elements, conditionally (whConZero
) and marginally (whichZero
), set at probability level modelList$ematAlpha
); and latent variables (wMu, wSd
).
list
of predicted values, including species richness
(responses predicted > 0); inverse predicted x
(xpredMu, xpredSd
) and predicted y
(ypredMu, ypredSd
) matrices.
If traits are modeled, then parameters will additionally include betaTraitMu, betaTraitSe (coefficients), sigmaTraitMu, sigmaTraitSe (covariance). prediction will additionally include tMuOrd (ordinal trait means), tMu, tSe (trait predictions).
Note that formula
begins with ~
, not y ~
. The response matrix
is passed in the form of a n
by S
matrix or data.frame
ydata
.
Both predictors in xdata
and responses in ydata
can include missing values as NA
. Factors in xdata
should be declared using factor
. For computational stability variables that are not factors are standardized by mean and variance, then transformed back to original scales in output
. To retain a variable in its original scale during computation include it in the character string notStandard
as part of the list modelList
. (example shown in the vignette
on traits).
modelList
has these defaults and provides these options:
ng = 2000
, number of Gibbs steps.
burnin = 500
, no. initial steps, must be less than ng
.
typeNames
can be 'PA'
(presenceAbsence), 'CON'
(continuous on (-Inf, Inf)
), 'CA'
(continuous abundance, zero censoring), 'DA'
(discrete abundance), 'FC'
(fractional composition),
'CC'
(count composition), 'OC'
(ordinal counts), 'CAT'
(categorical classes). typeNames
can be a single value that applies to all columns in ydata
, or there can be one value for each column.
holdoutN = 0
, number of observations to hold out for out-of-sample
prediction.
holdoutIndex = numeric(0)
, numeric vector
of observations (row numbers) to holdout for out-of-sample prediction.
censor = NULL
, list
specifying columns, values, and intervals for
censoring, see gjamCensorY
.
effort = NULL
, list
containing 'columns'
, a vector of length <= S
giving the names of columns in in y
, and 'values'
, a length-n
vector of effort or a n
by S
matrix (see Examples). effort
can be plot area, search time, etc. for discrete count data 'DA'
.
FULL = F
in modelList
will save full prediction chains in $chains$ygibbs
.
notStandard = NULL
, character vector
of column names in xdata
that should not be standardized.
reductList = list(N = 20, r = 3)
, list
of dimension reduction parameters, invoked when reductList
is included in modelList
or automatically when ydata
has too many columns. See vignette
on Dimension Reduction.
random
, character
string giving the name of a column in xdata
that will be used to specify random effects. The random group column should be declared as a factor
. There should be replication, i.e., each group level occurs multiple times.
REDUCT = F
in modelList
overrides automatic dimension reduction.
FCgroups, CCgroups
, are length-S vectors
assigned to columns in ydata
indicating composition 'FC'
or 'CC'
group membership. For example, if there are two 'CA' columns in ydata
followed by two groups of fractional composition data, each having three columns, then typeNames = c('CA','CA','FC','FC','FC','FC','FC','FC')
and FCgroups = c(0,0,1,1,1,2,2,2)
. note: gjamSimData
is not currently set up to simulate multiple composition groups, but gjam
will model it.
PREDICTX = T
executes inverse prediction of x
. Speed-up by setting PREDICTX = F
.
ematAlpha = .5
is the probability assigned for conditional and marginal independence in the ematrix
.
traitList = list(plotByTrait, traitTypes, specByTrait)
, list of trait objects. See vignette on Trait analysis.
More detailed vignettes can be obtained with:
browseVignettes('gjam')
Clark, J.S., D. Nemergut, B. Seyednasrollah, P. Turner, and S. Zhang. 2017. Generalized joint attribute modeling for biodiversity analysis: Median-zero, multivariate, multifarious data. Ecological Monographs, 87, 34-56.
gjamSimData
simulates data
A more detailed vignette is can be obtained with:
browseVignettes('gjam')
website 'http://sites.nicholas.duke.edu/clarklab/code/'.
# NOT RUN {
## combinations of scales
types <- c('DA','DA','OC','OC','OC','OC','CON','CON','CON','CON','CON','CA','CA','PA','PA')
f <- gjamSimData(S = length(types), typeNames = types)
ml <- list(ng = 500, burnin = 50, typeNames = f$typeNames)
out <- gjam(f$formula, f$xdata, f$ydata, modelList = ml)
summary(out)
# repeat with ng = 5000, burnin = 500, then plot data:
pl <- list(trueValues = f$trueValues)
gjamPlot(out, plotPars = pl)
## discrete abundance with heterogeneous effort
S <- 5
n <- 1000
eff <- list( columns = 1:S, values = round(runif(n,.5,5),1) )
f <- gjamSimData(n, S, typeNames='DA', effort=eff)
ml <- list(ng = 2000, burnin = 500, typeNames = f$typeNames, effort = eff)
out <- gjam(f$formula, f$xdata, f$ydata, modelList = ml)
summary(out)
# repeat with ng = 2000, burnin = 500, then plot data:
pl <- list(trueValues = f$trueValues)
gjamPlot(out, plotPars = pl)
# }
Run the code above in your browser using DataLab