Learn R Programming

EpiBayes (version 0.1.2)

EpiBayes_s: Disease Model with Storage

Description

This function is used to model disease (freedom and prevalence estimation) using a Bayesian hierarchical model one, two, or three levels of sampling. The most general, the three level, would be represented by the hierarchy: (region -> subzone -> cluster -> subject) and the two- and one- level sampling would simply be the same with either subzone removed or subzone and cluster removed, respectively. This function is the 'storage' model and hence stores all, even extraneous, output.

Usage

EpiBayes_s(H, k, n, seasons, reps, MCMCreps, poi = "tau", y = NULL, mumodes = matrix(c(0.5, 0.7, 0.5, 0.7, 0.02, 0.5, 0.02, 0.5), 4, 2, byrow = TRUE), pi.thresh = 0.02, tau.thresh = 0.05, gam.thresh = 0.1, tau.T = 0, poi.lb = 0, poi.ub = 1, p1 = 0.95, psi = 4, omegaparm = c(100, 1), gamparm = c(100, 1), tauparm = c(1, 1), etaparm = c(100, 6), thetaparm = c(100, 6), burnin = 1000)

Arguments

H
Number of subzones/states. Should be a 1 if there are only one or two levels of sampling. Integer scalar.
k
Number of clusters / farms / ponds / herds. Should be a 1 if there is only one level of sampling. Integer vector (H x 1).
n
Number of subjects / animals / mussels / pigs per cluster (can differ among clusters). Integer vector (sum(k) x 1).
seasons
Numeric season for each cluster in the order: Summer (1), Fall (2), Winter (3), Spring (4). Integer vector (sum(k) x 1).
reps
Number of (simulated) replicated data sets. Integer scalar.
MCMCreps
Number of iterations in the MCMC chain per replicated data set. Integer scalar.
poi
The p(arameter) o(f) i(nterest) specifies one of the subzone-level prevalence (gam) or the cluster-level prevalence (tau), indicating which variable with which to compute the simulation output p2.tilde, p4.tilde, and p6.tilde. Character scalar.
y
An optional input of sums of positive diagnostic testing results if one has a specific set of diagnostic testing outcomes for every subject (will simulate these if this is left as NULL). Integer matrix (reps x sum(k)).
mumodes
Modes and (a) 95th percentiles for mode $\leq$ 0.50 or (b) 5th percentiles for mode $>$ 0.5 for season-specific mean prevalences for diseased clusters in the order: Summer, Fall, Winter, Spring. Real matrix (4 x 2).
pi.thresh
Threshold that we must show subject-level prevalence is below to declare disease freedom. Real scalar.
tau.thresh
Threshold that we must show cluster-level prevalence is below to declare disease freedom. Real scalar.
gam.thresh
Threshold that we must show subzone-level prevalence is below to declare disease freedom. Real scalar.
tau.T
Assumed true cluster-level prevalence (used to simulate data to feed into the Bayesian model). Real scalar.
poi.lb,poi.ub
Lower and upper bounds for posterior poi prevalences to show ability to capture poi with certain probability. Real scalars.
p1
Probability we must show poi prevalence is below / above the thresholds pi.thresh or tau.thresh or within specified bounds. Real scalar.
psi
(Inversely related to) the variability of the subject-level prevalences in diseased clusters. Real scalar.
omegaparm
Prior parameters for the beta-distributed input omega, which is the probability that the disease is in the region. Real vector (2 x 1).
gamparm
Prior parameters for the beta-distributed input gam, which is the subzone-level prevalence (or the prevalence among subzones). Real vector (2 x 1).
tauparm
Prior parameters for the beta-distributed input tau, which is the cluster-level prevalence (or the prevalence among clusters). Real vector (2 x 1).
etaparm
Prior parameters for the beta-distributed input eta, which is the sensitivity of the diagnostic test. Real vector (2 x 1).
thetaparm
Prior parameters for the beta-distributed input theta, which is the specificity of the diagnostic test. Real vector (2 x 1).
burnin
Number of MCMC iterations to discard from the beginning of the chain. Integer scalar.

Value

This function is the 'storage' model meaning it returns the posterior distributions of all of the model parameters. It returns enough to perform inference and to perform diagnostic testing for the model fit nor MCMC convergence. If this is the first time running the model, we recommend that the user utilizes this function, EpiBayes_s, and diagnose issues before continuing with the 'no storage' model, EpiBayes_ns, for faster results, especially if it is being used as a simulation-type model and not a poserior inference-type model. Nevertheless, below are the outputs of this model.
Output Attributes
Description p2.tilde
Real scalar Proportion of simulated data sets that result in the probability of poi prevalence below poi.thresh with probability p1
p4.tilde Real scalar
Proportion of simulated data sets that result in the probability of poi prevalence above poi.thresh with probability p1 p6.tilde
Real scalar Proportion of simulated data sets that result in the probability of poi prevalence between poi.lb and poi.ub with probability p1
taumat Real array (reps x H x MCMCreps)
Posterior distributions of the cluster-level prevalence for all simulated data sets (i.e., reps) gammat
Real matrix (reps x MCMCreps) Posterior distribution of the subzone-level prevalence
omegamat Real matrix (reps x MCMCreps)
Posterior distribution of the probability of the disease being in the region z.gammat
Real matrix (reps x MCMCreps) Posterior distribution for the latent indicator denoting whether the disease is present among the subzones
z.taumat Real array (reps x H x MCMCreps)
Posterior distribution for the latent indicator denoting whether the disease is present among the clusters pimat
Real array (reps x sum(k) x MCMCreps) Posterior distribution for the subject-level (or within-cluster) prevalence
z.pimat Real array (reps x sum(k) x MCMCreps)
Posterior distribution for the latent indicators denoting whether the disease is present within any given cluster mumat
Real matrix (reps x 4 x MCMCreps) Posterior distribution for the mean prevalence within diseased clusters
psi Real scalar
User-input psi value etamat
Real matrix (reps x MCMCreps) Posterior distribution for the sensitivity of the diagnostic test
thetamat Real matrix (reps x MCMCreps)
Posterior distribution for the specificity of the diagnostic test c1mat
Real array (reps x sum(k) x MCMCreps) Posterior disribution for the latent indicators denoting true positive results from the diagnostic test
c2mat Real array (reps x sum(k) x MCMCreps)
Posterior disribution for the latent indicators denoting true negative results from the diagnostic test mumh.tracker
Real matrix (reps x 4) Vector of proportions of accepted Metropolis-Hastings proposals for the simulation from the posterior of input mu
y Integer matrix (reps x sum(k))
Matrix of observed counts of diseased subjects per cluster per simulated data set (or the actual observed counts input by the user) ForOthers
Various other data not intended to be used by the user, but used to pass information on to the plot, summary, and print methods
The posterior distribution output, say taumat, can be manipulated after it is returned with the coda package after it is converted to an mcmc object.

Parameter Meanings

Below we describe the meanings of the parameters in epidemiological language for ease of implementation and elicitation of inputs to this function.
Parameter (3-level) Description
(2-level) Description omega
Probability of disease being in the region. Not used.
gam Subzone-level (between-subzone) prevalence.
Probability of disease being in the region. z.gam
Subzone-level (between-subzone) prevalence latent indicator variable. Not used.
tau Cluster-level (between-cluster) prevalence.
Same as (3-level). z.tau
Cluster-level (between-cluster) prevalence latent indicator variable. Same as (3-level).
pi Subject-level (within-cluster) prevalence.
Same as (3-level). z.pi
Subject-level (within-cluster) prevalence latent indicator variable. Same as (3-level).
mu Mean prevalence among infected clusters.
Same as (3-level). psi
(Related to) variability of prevalence among infected clusters (inversely related so higher psi -> lower variance of prevalences among diseased clusters). Same as (3-level).
eta Diagnostic test sensitivity.
Same as (3-level). theta
Diagnostic test specificity. Same as (3-level).
c1 Latent count of true positive diagnostic test results.
Same as (3-level). c2
Latent count of true negative diagnostic test results. Same as (3-level).
Note that in the code, each of these variables are denoted, for example, taumat instead of plain tau to denote that that variable is a matrix (or, more generally an array) of values.

Details

Note that this function performs in the same manner as EpiBayes_ns except it stores the posterior distributions of all of the parameters in the model and hence takes a bit longer to run. This model is a Bayesian hierarchical model that serves two main purposes:
  • Simulation model: can simulate data under user-specified conditions and run replicated data sets under the Bayesian model to observed the behavior of the system under random realizations of simulated data.
  • Posterior inference model: can use actual observed data from the field, run it through the Bayesian model, and make inference on parameter(s) of interest using the posterior distribution(s).

The posterior distributions are avaialable for a particular parameter, say tau, by typing name_of_your_model$taumat. Note: be careful about the size of the taumat matrix you are calling. The last index of any of the variables from above is the MCMC replications and so we would typically always omit the last index when looking at any particular variable.

  • If we want to look at the posterior distribution of the cluster level prevalence (tau) for the first replication, we will note that taumat is a matrix with rows indexed by replication and columns by MCMC replications. Then, we will type something like name_of_your_model$taumat[1, ] to visually inspect the posterior distribution in the form of a vector. For the second replication, we can type name_of_your_model$taumat[2, ], and so forth. Then, we can make histograms of these distributions if we so desire by the following code: hist(name_of_your_model$taumat[1, ], col = "cyan");box("plot"). To observe a trace plot, we can type: plot(name_of_your_model$taumat[1, ], type = "l") for all of the MCMC replications and we can look at the trace plot after a burnin of 1000 iterations by typing: plot(name_of_your_model$taumat[1, -c(1:1000)], type ="l").

  • If we want to look at the posterior distribution for the subject-level prevalence (pi) for the tenth replication in the third cluster, we would type name_of_your_model$pimat[10, 1, ] since the matrix containing the posterior distributions for the subject-level prevalences are indexed by replications in the first dimension, clusters in the second, and MCMC replications in the third. We can make histograms and trace plots using the same code as from above.
  • References

    Branscum, A., Johnson, W., and Gardner, I. (2006) Sample size calculations for disease freedom and prevalence estimation surveys. Statistics in Medicine 25, 2658-2674.

    See Also

    The function EpiBayes_ns stores less output so the user may run the model more quickly, while losing the ability to diagnose any model fit or convergence issues.

    Examples

    Run this code
    testrun_storage = EpiBayes_s(
    		H = 2,
    		k = rep(30, 2),
    		n = rep(rep(150, 30), 2),
    		seasons = rep(c(1, 2, 3, 4), each = 15),
    		reps = 10,
    		MCMCreps = 10,
    		poi = "tau",
    		y = NULL,
    		mumodes = matrix(c(
    			0.50, 0.70,
    			0.50, 0.70,
    			0.02, 0.50,
    			0.02, 0.50
    			), 4, 2, byrow = TRUE
    		),
    		pi.thresh = 0.05,
    	    tau.thresh = 0.02,
         gam.thresh = 0.10,
    		tau.T = 0,
    		poi.lb = 0,
    		poi.ub = 1,
    		p1 = 0.95,
    		psi = 4,
    		omegaparm = c(100, 1),
    		gamparm = c(100, 1),
    		tauparm = c(1, 1),
    		etaparm = c(100, 6),
    		thetaparm = c(100, 6),
    		burnin = 1
    		)
    
    testrun_storage
    print(testrun_storage)
    testrun_storagesummary = summary(testrun_storage, prob = 0.90, n.output = 5)
    testrun_storagesummary
    plot(testrun_storage$taumat[1, 1, ], type = "l")
    plot(testrun_storage$gammat[1, ], type = "l")
    
    ## Can look at all posterior distributions
        plot(testrun_storage$pimat[1, 1, ], type = "l")
        plot(testrun_storage$omegamat[1, ], type = "l")
    

    Run the code above in your browser using DataLab