Learn R Programming

BaSTA (version 1.3)

basta: Parametric Bayesian estimation of age-specific survival for left-truncated and right-censored capture-recapture/recovery data.

Description

This function performs multiple Markov Chain Monte Carlo (MCMC) algorithms for the Bayesian estimation of age-specific mortality and survival trends when a large proportion of (or all) records have unknown times of birth and/or death. Survival parameters and unknown (i.e. latent) birth and death times are estimated, allowing the user to test a range of mortality patterns, and to test the effect of continuous and/or discrete covariates following Colchero and Clark's (2012) approach.

Usage

basta(object, ...)

## S3 method for class 'If requested, a matrix with convergence coefficients based on potential scale reduction as described by Gelman \emph{et al.': \item{Convergence (2004)nin = 5001, thinning = 50, recaptTrans = studyStart, thetaStart = NULL, thetaJumps = NULL, thetaPriors = NULL, gammaStart = NULL, gammaJumps = NULL, gammaPriors = NULL, nsim = 1, parallel = FALSE, ncpus = 2, lifeTable = TRUE, progrPlots = FALSE, updateJumps = TRUE, ...)

Arguments

object
A data.frame to be used as an input data file for BaSTA. The first column is a vector of individual unique IDs, the second and third columns are birth and death years respectively. Columns 4-(nt-1) represent the observation window of nt years
studyStart
The first year of the study.
studyEnd
The last year of the study.
minAge
Age at which the analysis should start (see details)
model
The underlying mortality model to be used. "EX" = exponential, "GO"= Gompertz, "WE" = Weibull and "LO" = logistic (see details).
shape
The overall shape of the model. Values are: simple = no extra parameters added; Makeham = a constant parameter is added to the mortality rate; and bathtub = a Gompertz declining mortality for early ages and a constan
covarsStruct
Character string that indicates how covariates should be evaluated. The options are: fused, which defines all categorical variables as covariates for each mortality parameter and all continuous covariates under a proportional
niter
The total number of MCMC steps.
burnin
The number of iterations for the burn in (see details).
thinning
The number of skipped MCMC steps to minimize serial autocorrelation (see details).
recaptTrans
A vector (of length npi) defining the recapture probability transition times (RPTP). These are points (years) where the recapture probability is thought to change. The default setting is for the recapture probability to be constant throughout the study,
thetaStart
A vector defining the initial values for each parameter in the survival model. The number of parameters should be: 2 for Gompertz, 3 for Gompertz-Makeham, and 5 for Siler (see details). The default is set to NULL and thus a set o
thetaJumps
A vector defining the size of jumps for each survival model parameter (i.e. the standard error in the Metropolis step). The number of parameters will depend on the model selected, as with thetaStart. As with thetaStart, the defau
thetaPriors
A vector defining the priors for each survival parameter. The vector takes the same format as described for thetaStart. As with thetaStart, the default is set to NULL and thus default values are assigned (i.e. all eq
gammaStart
A vector of initial parameters for the proportional hazards section of the model. These can be specified when Prop.Hazards is TRUE or when continuous covariates are evaluated under a fused covariate
gammaJumps
A vector of jump standard errors for the proportional hazards parameters. The number of jumps will be equal to the number of covariates when the argument covarsStruct is set to prop.haz, or to the number of conti
gammaPriors
A vector of mean priors for the proportional hazards parameters. The specification follows the same rules as gammaJumps.
nsim
A numerical value for the number of simulations to be run.
parallel
A logical argument indicating whether the multiple simulations should be run in parallel or not. If TRUE, package snowfall is called and multiple simulations are run in parallel. If snowfall is not installed, the model i
ncpus
a numerical value that indicates the number of cpus to be used if parallel is TRUE and package snowfall is installed. The default is 2 cpus. If package pkg{snowfall} is not installed, the simulations are run in series.
lifeTable
A logical argument indicating whether or not to produce life tables. If TRUE, a cohort life table is calculated using function MakeLifeTable.
progrPlots
A logical argument indicating wheter to draw progress plots or not. If TRUE, small plots are displayed showing the percent progress achieved for each MCMC simulation.
updateJumps
A logical argument indicating wheter to update jumps until an update rate of 0.2 is achieved.
...
Ignored.

Value

  • coefficientsA matrix with estimated coefficients (i.e. mean values per parameter on the thinned sequences after burnin), which includes standard errors, upper and lower 95% credible intervals, update rates per parameter (commonly the same for all survival and proportional hazards parameters), serial autocorrelation on the thinned sequences and the potential scale reduction factor for convergence (see Convergence value below).
  • DICBasic deviance information criterion (DIC) calculations to be used for model selection (Spiegelhalter et al. 2002). These values should only be used a reference (see comments in Spiegelhalter et al. 2002). A future version of the package will include routines for reversible jump Markov Chain Monte Carlo (RJMCMC, see King & Brooks 2002). If all or some of the simulations failed, then the returned value is NULL.
  • ConvergenceIf requested, a matrix with convergence coefficients based on potential scale reduction as described by Gelman et al. (2004). If only one simulation was ran, then the returned value is NULL.
  • KullbackLeiblerIf called by summary, list with Kullback-Leibler discrepancy matrices between pair of parameters for categorical covariates (McCulloch 1989, Burnham and Anderson 2001) and McCulloch's (1989) calibration measure. If only one simulation was ran or if no convergence is reached, then the returned value is NULL.
  • settingsIf called by summary, this is a vector indicating the number of iterations for each MCMC, the burn in sequence, the thinning interval, and the number of simulations that were run.
  • ModelSpecsModel specifications inidicating the model, the shape and the covariate structure that were specified by the user.
  • JumpPriorsIf requested or called by functions summary or summary.basta, a matrix with the jumps and priors used in the model.
  • ParamsIf requested, the full matrix of parameter estimates. The row names indicate the simulation to which they correspond.
  • BisIf requested, the full matrix of estimates of times of birth. The row names indicate the simulation to which they correspond.
  • DisIf requested, the full matrix of estimates of times of death. The row names indicate the simulation to which they correspond.
  • BqIf requested, summary matrix of estimated times of birth including median and upper and lower 95% predictive intervals.
  • XqIf requested, summary matrix of estimated ages at death including median and upper and lower 95% predictive intervals.
  • SxIf requested or called by functions plot or plot.basta median and 95% predictive intervals for the estimated survival probability.
  • mxIf requested or called by functions plot or plot.basta median and 95% predictive intervals for the estimated mortality rates.
  • xvIf requested, list with vectors of ages at death for each categorical covariate.
  • bdIf requested, the matrix with individual times of birth and death.
  • YIf requested, the capture-recapture matrix.
  • ZaIf requested, the matrix of categorical covariates. When no categorical covariates are included the model returns a one column matrix of ones.
  • ZcIf requested, the matrix of continuous covariates. When no continuous covariates are included the model returns a one column matrix of ones.
  • ststartIf requested, the first year of the study.
  • stendIf requested, the last year of the study.
  • finishedVector of binary values indicating which simulations ran to the end.
  • lifeTableIf requested and specified in the argument lifeTable, a cohort life table calculated from the estimated ages at death.
  • lambdaIf argument minAge is larger than 0 and if requested, matrix with lambda parameter estimates for early changes in distribution of ages at death.

Details

To construct the input data object the function CensusToCaptHist can be used to build the capture-recapture matrix, while the covariate (design) matrix can be constructed with the MakeCovMat function.

basta uses parametric mortality functions to estimate age-specific mortality (survival) from capture-recapture/recovery data. The mortality rate function describes how the risk of mortality changes with age, and is defined as $\mu(x | \theta)$, where $x$ corresponds to age and $\theta$ is a vector of parameters to be estimated.

The model argument allows the user to choose between four basic mortality rate functions:

(a) Exponential (EX; Cox and Oakes 1974), with constant mortality rate:

$$\mu_b(x | b) = b$$ with $b > 0$.

(b) Gompertz (GO; Gompertz 1925, Pletcher 1999):

$$\mu_b(x | b) = exp(b_0 + b_1 x)$$ with $-\infty < b_0, b_1 < \infty$.

(c) Weibull (WE; Pinder III et al. 1978):

$$\mu_b(x | b) = b_0 b_1^(b_0) x^(b_0 -1)$$ with $b_0, b_1 > 0$.

(d) logistic (LO; Pletcher 1999):

$$\mu_b(x | b) = exp(b_0 + b_1 x) / (1 + b_2 exp(b_0)/b_1 (exp(b_1 x)-1))$$ with $b_0, b_1, b_2 > 0$.

The shape argument allows the user to extend these models in order to explore more complex mortality shapes. The default value is simple which leaves the model as defined above. With value Makeham, a constant is added to the mortality rate, making the model equal to $\mu_0(x | \theta)= \mu_b(x | b) + c$, where $\theta = [c, b]$. With value bathtub, concave shapes in mortality can be explored. This is achieved by adding a declining Gompertz term and a constant parameter to the basic mortality model, namely:

$$\mu_0(x | \theta) = exp(a_0 - a_1 x) + c + \mu_b(x | b)$$. with $-\infty < a_0 < \infty$, $a_1 > 0$ and $c > - (exp(a_0 - a_1 x_m) + \mu_b(x_m | b))$, where $x_m$ is the age at which $\mu_0(x | \theta)$ reaches the mininum vale.

To incorporate covariates into the inference process, the mortality model is further extended by including a proportional hazards structure, of the form:

$$\mu(x | \theta, \Gamma, Z_a, Z_c) = \mu_0(x | \theta, Z_a) exp(\Gamma Z_c)$$

where $\mu_0(x | \theta, Z_a)$ represents the mortality section as defined above, while the second term $exp(\Gamma Z_c)$ corresponds to the proportional hazards function. $Z_a$ and $Z_c$ are covariate (design) matrices for categorical and continuous covariates, respectively.

When covariates are included in the dataset, the basta function provides three different ways in which these can be evaluated by using argument covarsStruct:

1. fused will make the mortality parameters linear functions of all categorical covariates (analogous to a generalised linear model (GLM) structure) and will put all continuous covariates under a proportional hazards structure. Thus, for a simple exponential mortality rate of the form $\mu_0(x | b) = b$, the parameter is equal to $b = b_0 + b_1 z_1 + \dots$, where $[b_0, \dots, b_k]$ are paramters that link the mortality parameter $b$ with the categorical covariates $[z_1,\dots,z_k]$.

2. prop.haz will put all covariates under a proportional hazards structure irrespective of the type of variable.

3. all.in.mort will put all covariates as linear functions of the survival parameters as explained above. Since most models require the lower bounds for the mortality parameters to be equal to 0, the only model that can be used for this test is Gompertz with shape set to simple. In case these arguments are specified deferently, a warning message is printed noting that model will be forced to be GO and shape will be set to simple.

The burnin argument represents the number of steps at the begining of the MCMC run that is be discarded. This sequence commonly corresponds to the non-converged section of the MCMC sequence. Convergence and model selection measures are calculated from the remaining thinned parameter chains if multiple simulations are run, and all if all of them run to completion.

The thinning argument specifies the number of steps to be skipped in order to reduce serial autocorrelation. The thinned sequence, which only includes steps after burn in, is then used to calculate convergence statistics and model for selection.

The number of parameters in thetaStart is a vector or matrix that defines the initial values for each $\theta$ parameter of the mortality function. The number of parameters will depend on the model chosen with model (see above). If the number of parameters specified does not match the number of parameters inherent to the model and shape selected, the function returns an error. If no thetaStart argument is specified (i.e. default is NULL), the model randomly generates a set of initial parameters.

As described above, the number of parameters for the gammaStart argument (i.e. section b), namely the proportional hazards section, will be a function of the number of continuous covariates if argument covarsStruct is fused, or to the total number of covariates when covarsStruct is prop.haz.

References

Burnham, K.P. and Anderson, D.R. (2001) Kullback-Leibler information as a basis for strong inference in ecological studies. Widlife Research, 28, 111-119.

Colchero, F. and J.S. Clark (2012) Bayesian inference on age-specific survival from capture-recapture data for censored and truncated data. Journal of Animal Ecology. 81, 139-149.

Colchero, F., O.R. Jones and M. Rebke. (2012) BaSTA: an R package for Bayesian estimation of age-specific survival from incomplete mark-recapture/recovery data with covariates. Method in Ecology and Evolution. 3, 466-470.

Cox, D. R., and Oakes D. (1984) Analysis of Survival Data. Chapman and Hall, London.

Gelman, A., Carlin, J.B., Stern, H.S. and Rubin, D.B. (2004) Bayesian data analysis. 2nd edn. Chapman & Hall/CRC, Boca Raton, Florida, USA.

Gompertz, B. (1825) On the nature of the function expressive of the law of human mortality, and on a new mode of determining the value of life contingencies. Philosophical Transactions of the Royal Society of London, 115, 513-583.

King, R. and Brooks, S.P. (2002) Bayesian model discrimination for multiple strata capture-recapture data. Biometrika, 89, 785-806.

McCulloch, R.E. (1989) Local model influence. Journal of the American Statistical Association, 84, 473-478.

Pinder III, J.E., Wiener, J.G. and Smith, M.H. (1978) The Weibull distribution: a new method of summarizing survivorship data. Ecology, 59, 175-179.

Spiegelhalter, D.J., Best, N.G., Carlin, B.P. and van der Linde, A. (2002) Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society: Series B, 64, 583-639.

See Also

summary.basta, print.basta, plot.basta to visualise summary outputs for objects of class basta. CensusToCaptHist and MakeCovMat for raw data formatting.

Examples

Run this code
## Load data:
data("sim1", package = "BaSTA")

## Check data consistency:
new.dat  <- DataCheck(sim1, studyStart = 51, 
            studyEnd = 70, autofix = rep(1,7))

## Run short version of BaSTA on the data:
out      <- basta(sim1, studyStart = 51, studyEnd = 70, 
            niter = 200, burnin = 11, thinning = 10,  
            progrPlots = TRUE)

## Print results:
summary(out, digits = 3)

## Plot traces for survival parameters:
plot(out)

## Plot traces for proportional hazards parameter:
plot(out, trace.name = "gamma")

## Plot survival and mortality curves:
plot(out, plot.trace = FALSE)

Run the code above in your browser using DataLab