RDS.bootstrap.intervals: RDS Bootstrap Interval Estimates

Description

This function computes an interval estimate for one or more categorical variables. It optionally uses attributes of the RDS data set to determine the type of estimator and type of uncertainty estimate to use.

Usage

RDS.bootstrap.intervals(
  rds.data,
  outcome.variable,
  weight.type = NULL,
  uncertainty = NULL,
  N = NULL,
  subset = NULL,
  confidence.level = 0.95,
  number.of.bootstrap.samples = NULL,
  fast = TRUE,
  useC = TRUE,
  ci.type = "t",
  control = control.rds.estimates(),
  to.factor = FALSE,
  cont.breaks = 3,
  ...
)

Value

An object of class rds.interval.estimate summarizing the inference. The confidence interval and standard error are based on the bootstrap procedure. In additon, the object has attribute bsresult which provides details of the bootstrap procedure. The contents of the bsresult attribute depends on the uncertainty used. If uncertainty=="Salganik" then bsresult is a vector of standard deviations of the bootstrap samples. If uncertainty=="Gile's SS" then bsresult is a list with components for the bootstrap point estimate, the bootstrap samples themselves and the standard deviations of the bootstrap samples. If uncertainty=="SRS" then bsresult is NULL.

Arguments

rds.data: An rds.data.frame that indicates recruitment patterns by a pair of attributes named ``id'' and ``recruiter.id''.
outcome.variable: A string giving the name of the variable in the rds.data that contains a categorical or numeric variable to be analyzed.
weight.type: A string giving the type of estimator to use. The options are "Gile's SS", "RDS-I", "RDS-II", "RDS-I (DS)", and "Arithemic Mean". If NULL it defaults to "Gile's SS".
uncertainty: A string giving the type of uncertainty estimator to use. The options are "SRS", "Gile" and "Salganik". This is usually determined by weight.type to be consistent with the estimator's origins. The estimators RDS-I, RDS-I (DS), and RDS-II default to "Salganik", "Arithmetic Mean" defaults to "SRS" and "Gile's SS" defaults to the "Gile" bootstrap.
N: An estimate of the number of members of the population being sampled. If NULL it is read as the population.size.mid attribute of the rds.data frame. If that is missing it defaults to 1000.
subset: An optional criterion to subset rds.data by. It is a character string giving an R expression which, when evaluated, subset the data. In plain English, it can be something like "seed > 0" to exclude seeds. It can be the name of a logical vector of the same length of the outcome variable where TRUE means include it in the analysis. If NULL then no subsetting is done.
confidence.level: The confidence level for the confidence intervals. The default is 0.95 for 95%.
number.of.bootstrap.samples: The number of bootstrap samples to take in estimating the uncertainty of the estimator. If NULL it defaults to the number necessary to compute the standard error to accuracy 0.001. outcome.variable. Otherwise it will compute the population frequencies of each value of the outcome.variable.
fast: Use a fast bootstrap where the weights are reused from the estimator rather than being recomputed for each bootstrap sample.
useC: Use a C-level implementation of Gile's bootstrap (rather than the R level). The implementations should be a computational equivalent estimator (except for speed).
ci.type: Type of confidence interval to use, if possible. If "t", use lower and upper confidence interval values based on the standard deviation of the bootstrapped values and a t multiplier. If "pivotal", use lower and upper confidence interval values based on the basic bootstrap (also called the pivotal confidence interval). If "quantile", use lower and upper confidence interval values based on the quantiles of the bootstrap sample. If "proportion", use the "t" unless the estimated proportion is less than 0.15 or the bounds are outside [0,1 . In this case, try the "quantile" and constrain the bounds to be compatible with [0,1].
control: A list of control parameters for algorithm tuning. Constructed using
control.rds.estimates.
to.factor: force variable to be a factor
cont.breaks: For continuous variates, some bootstrap proceedures require categorical data. In these cases, in order to contruct each bootstrap replicate, the outcome variable is split into cont.breaks categories.
...: Additional arguments for RDS.*.estimates.

References

Gile, Krista J. 2011 Improved Inference for Respondent-Driven Sampling Data with Application to HIV Prevalence Estimation, Journal of the American Statistical Association, 106, 135-146.

Gile, Krista J., Handcock, Mark S., 2010. Respondent-driven Sampling: An Assessment of Current Methodology, Sociological Methodology, 40, 285-327. <doi:10.1111/j.1467-9531.2010.01223.x>

Gile, Krista J., Beaudry, Isabelle S. and Handcock, Mark S., 2018 Methods for Inference from Respondent-Driven Sampling Data, Annual Review of Statistics and Its Application <doi:10.1146/annurev-statistics-031017-100704>.

Examples

Run this code


if (FALSE) {
data(fauxmadrona)
RDS.bootstrap.intervals(rds.data=fauxmadrona,weight.type="RDS-II",
     uncertainty="Salganik",
	outcome.variable="disease",N=1000,number.of.bootstrap.samples=50)

data(fauxtime)
RDS.bootstrap.intervals(rds.data=fauxtime,weight.type="HCG",
     uncertainty="HCG",
	outcome.variable="var1",N=1000,number.of.bootstrap.samples=10)
}

Run the code above in your browser using DataLab