This function computes an interval estimate for one or more categorical variables. It optionally uses attributes of the RDS data set to determine the type of estimator and type of uncertainty estimate to use.
RDS.bootstrap.intervals(
rds.data,
outcome.variable,
weight.type = NULL,
uncertainty = NULL,
N = NULL,
subset = NULL,
confidence.level = 0.95,
number.of.bootstrap.samples = NULL,
fast = TRUE,
useC = TRUE,
ci.type = "t",
control = control.rds.estimates(),
to.factor = FALSE,
cont.breaks = 3,
...
)
An object of class rds.interval.estimate
summarizing the inference.
The confidence interval and standard error are based on the bootstrap procedure.
In additon, the object has attribute bsresult
which provides details of the
bootstrap procedure. The contents of the bsresult
attribute depends on the
uncertainty
used. If uncertainty=="Salganik"
then bsresult
is a
vector of standard deviations of the bootstrap samples.
If uncertainty=="Gile's SS"
then
bsresult
is a list with components for the bootstrap point estimate,
the bootstrap
samples themselves and the standard deviations of the bootstrap samples.
If uncertainty=="SRS"
then bsresult
is NULL.
An rds.data.frame
that indicates recruitment patterns
by a pair of attributes named ``id'' and ``recruiter.id''.
A string giving the name of the variable in the
rds.data
that contains a categorical or numeric variable to be
analyzed.
A string giving the type of estimator to use. The options
are "Gile's SS"
, "RDS-I"
, "RDS-II"
, "RDS-I (DS)"
,
and "Arithemic Mean"
. If NULL
it defaults to "Gile's
SS"
.
A string giving the type of uncertainty estimator to use.
The options are "SRS"
, "Gile"
and "Salganik"
. This is usually
determined by weight.type
to be consistent with the estimator's
origins. The estimators RDS-I, RDS-I (DS), and RDS-II default to "Salganik"
, "Arithmetic
Mean" defaults to "SRS"
and "Gile's SS" defaults to the "Gile"
bootstrap.
An estimate of the number of members of the population being
sampled. If NULL
it is read as the population.size.mid
attribute of
the rds.data
frame. If that is missing it defaults to 1000.
An optional criterion to subset rds.data
by. It is a
character string giving an R expression which, when evaluated, subset the
data. In plain English, it can be something like "seed > 0"
to
exclude seeds. It can be the name of a logical vector of the same length of
the outcome variable where TRUE means include it in the analysis. If
NULL
then no subsetting is done.
The confidence level for the confidence intervals. The default is 0.95 for 95%.
The number of bootstrap samples to take
in estimating the uncertainty of the estimator. If NULL
it defaults
to the number necessary to compute the standard error to accuracy 0.001.
outcome.variable
. Otherwise it will compute the population frequencies of each value of the outcome.variable
.
Use a fast bootstrap where the weights are reused from the estimator rather than being recomputed for each bootstrap sample.
Use a C-level implementation of Gile's bootstrap (rather than the R level). The implementations should be a computational equivalent estimator (except for speed).
Type of confidence interval to use, if possible. If "t", use lower and upper confidence interval values based on the standard deviation of the bootstrapped values and a t multiplier. If "pivotal", use lower and upper confidence interval values based on the basic bootstrap (also called the pivotal confidence interval). If "quantile", use lower and upper confidence interval values based on the quantiles of the bootstrap sample. If "proportion", use the "t" unless the estimated proportion is less than 0.15 or the bounds are outside [0,1 . In this case, try the "quantile" and constrain the bounds to be compatible with [0,1].
A list of control parameters for algorithm
tuning. Constructed using
control.rds.estimates
.
force variable to be a factor
For continuous variates, some bootstrap proceedures require categorical data. In these cases, in order to contruct each bootstrap replicate, the outcome variable is split into cont.breaks categories.
Additional arguments for RDS.*.estimates.
Gile, Krista J. 2011 Improved Inference for Respondent-Driven Sampling Data with Application to HIV Prevalence Estimation, Journal of the American Statistical Association, 106, 135-146.
Gile, Krista J., Handcock, Mark S., 2010. Respondent-driven Sampling: An Assessment of Current Methodology, Sociological Methodology, 40, 285-327. <doi:10.1111/j.1467-9531.2010.01223.x>
Gile, Krista J., Beaudry, Isabelle S. and Handcock, Mark S., 2018 Methods for Inference from Respondent-Driven Sampling Data, Annual Review of Statistics and Its Application <doi:10.1146/annurev-statistics-031017-100704>.
if (FALSE) {
data(fauxmadrona)
RDS.bootstrap.intervals(rds.data=fauxmadrona,weight.type="RDS-II",
uncertainty="Salganik",
outcome.variable="disease",N=1000,number.of.bootstrap.samples=50)
data(fauxtime)
RDS.bootstrap.intervals(rds.data=fauxtime,weight.type="HCG",
uncertainty="HCG",
outcome.variable="var1",N=1000,number.of.bootstrap.samples=10)
}
Run the code above in your browser using DataLab