impute.visibility: Estimates each person's personal visibility based on their self-reported degree and the number of their (direct) recruits. It uses the time the person was recruited as a factor in determining the number of recruits they produce.

Description

Estimates each person's personal visibility based on their self-reported degree and the number of their (direct) recruits. It uses the time the person was recruited as a factor in determining the number of recruits they produce.

Usage

impute.visibility(
  rds.data,
  max.coupons = NULL,
  type.impute = c("median", "distribution", "mode", "mean"),
  recruit.time = NULL,
  include.tree = FALSE,
  reflect.time = FALSE,
  parallel = 1,
  parallel.type = "PSOCK",
  interval = 10,
  burnin = 5000,
  mem.optimism.prior = NULL,
  df.mem.optimism.prior = 5,
  mem.scale.prior = 2,
  df.mem.scale.prior = 10,
  mem.overdispersion = 15,
  return.posterior.sample.visibilities = FALSE,
  verbose = FALSE
)

Arguments

rds.data: An rds.data.frame
max.coupons: The number of recruitment coupons distributed to each enrolled subject (i.e. the maximum number of recruitees for any subject). By default it is taken by the attribute or data, else the maximum recorded number of coupons.
type.impute: The type of imputation based on the conditional distribution. It can be of type distribution,mode,median, or mean with the first , the default, being a random draw from the conditional distribution.
recruit.time: vector; An optional value for the data/time that the person was interviewed. It needs to resolve as a numeric vector with number of elements the number of rows of the data with non-missing values of the network variable. If it is a character name of a variable in the data then that variable is used. If it is NULL then the sequence number of the recruit in the data is used. If it is NA then the recruitment is not used in the model. Otherwise, the recruitment time is used in the model to better predict the visibility of the person.
include.tree: logical; If TRUE, augment the reported network size by the number of recruits and one for the recruiter (if any). This reflects a more accurate value for the visibility, but is not the self-reported degree. In particular, it typically produces a positive visibility (compared to a possibility zero self-reported degree).
reflect.time: logical; If FALSE then the recruit.time is the time before the end of the study (instead of the time since the survey started or chronological time).
parallel: count; the number of parallel processes to run for the Monte-Carlo sample. This uses MPI or PSOCK. The default is 1, that is not to use parallel processing.
parallel.type: The type of parallel processing to use. The options are "PSOCK" or "MPI". This requires the corresponding type to be installed. The default is "PSOCK".
interval: count; the number of proposals between sampled statistics.
burnin: count; the number of proposals before any MCMC sampling is done. It typically is set to a fairly large number.
mem.optimism.prior: scalar; A hyper parameter being the mean of the distribution of the optimism parameter.
df.mem.optimism.prior: scalar; A hyper parameter being the degrees-of-freedom of the prior for the optimism parameter. This gives the equivalent sample size that would contain the same amount of information inherent in the prior.
mem.scale.prior: scalar; A hyper parameter being the scale of the concentration of baseline negative binomial measurement error model.
df.mem.scale.prior: scalar; A hyper parameter being the degrees-of-freedom of the prior for the standard deviation of the dispersion parameter in the visibility model. This gives the equivalent sample size that would contain the same amount of information inherent in the prior for the standard deviation.
mem.overdispersion: scalar; A parameter being the overdispersion of the negative binomial distribution that is the baseline for the measurement error model.
return.posterior.sample.visibilities: logical; If TRUE then return a matrix of dimension samplesize by n of posterior draws from the visibility distribution for those in the survey. The sample for the ith person is the ith column. The default is FALSE so that the vector of imputes defined by type.impute is returned.
verbose: logical; if this is TRUE, the program will print out additional

References

McLaughlin, Katherine R.; Johnston, Lisa G.; Jakupi, Xhevat; Gexha-Bunjaku, Dafina; Deva, Edona and Handcock, Mark S. (2023) Modeling the Visibility Distribution for Respondent-Driven Sampling with Application to Population Size Estimation, Annals of Applied Statistics, tools:::Rd_expr_doi("10.1093/jrsssa/qnad031")

Examples

Run this code

if (FALSE) {
data(fauxmadrona)
# The next line fits the model for the self-reported personal
# network sizes and imputes the personal network sizes 
# It may take up to 60 seconds.
visibility <- impute.visibility(fauxmadrona)
# frequency of estimated personal visibility
table(visibility)
}

Run the code above in your browser using DataLab