newsvyquantile: Quantiles under complex sampling.

Description

Estimates quantiles and confidence intervals for them. This function was completely re-written for version 4.1 of the survey package, and has a wider range of ways to define the quantile. See the vignette for a list of them.

Usage

svyquantile(x, design, quantiles, ...)
# S3 method for survey.design
svyquantile(x, design, quantiles, alpha = 0.05,
                      interval.type = c("mean", "beta","xlogit", "asin","score"),
                      na.rm = FALSE,  ci=TRUE, se = ci,
                      qrule=c("math","school","shahvaish","hf1","hf2","hf3",
		      "hf4","hf5","hf6","hf7","hf8","hf9"),
                      df = NULL, ...)
# S3 method for svyrep.design
svyquantile(x, design, quantiles, alpha = 0.05,
                      interval.type = c("mean", "beta","xlogit", "asin","quantile"),
                      na.rm = FALSE, ci = TRUE, se=ci,
                      qrule=c("math","school","shahvaish","hf1","hf2","hf3",
		      "hf4","hf5","hf6","hf7","hf8","hf9"),
                      df = NULL, return.replicates=FALSE,...)

Value

An object of class "newsvyquantile", except that with a replicate-weights design and interval.type="quantile" and return.replicates=TRUE it's an object of class "svrepstat"

Arguments

x: A one-sided formula describing variables to be used
design: Design object
quantiles: Numeric vector specifying which quantiles are requested
alpha: Specified confidence interval coverage
interval.type: See Details below
na.rm: Remove missing values?
ci,se: Return an estimated confidence interval and standard error?
qrule: Rule for defining the quantiles: either a character string specifying one of the built-in rules, or a function
df: Degrees of freedom for confidence interval estimation: NULL specifies degf(design)
return.replicates: Return replicate estimates of the quantile (only for interval.type="quantile")
...: For future expansion

Details

The pth quantile is defined as the value where the estimated cumulative distribution function is equal to p. As with quantiles in unweighted data, this definition only pins down the quantile to an interval between two observations, and a rule is needed to interpolate. The default is the mathematical definition, the lower end of the quantile interval; qrule="school" uses the midpoint of the quantile interval; "hf1" to "hf9" are weighted analogues of type=1 to 9 in quantile. See the vignette "Quantile rules" for details and for how to write your own.

By default, confidence intervals are estimated using Woodruff's (1952) method, which involves computing the quantile, estimating a confidence interval for the proportion of observations below the quantile, and then transforming that interval using the estimated CDF. In that context, the interval.type argument specifies how the confidence interval for the proportion is computed, matching svyciprop. In contrast to oldsvyquantile, NaN is returned if a confidence interval endpoint on the probability scale falls outside [0,1].

There are two exceptions. For svydesign objects, interval.type="score" asks for the Francisco & Fuller confidence interval based on inverting a score test. According to Dorfmann & Valliant, this interval has inferior performance to the "beta" and "logit" intervals; it is provided for compatibility.

For replicate-weight designs, interval.type="quantile" ask for an interval based directly on the replicates of the quantile. This interval is not valid for jackknife-type replicates, though it should perform well for bootstrap-type replicates, BRR, and SDR.

The df argument specifies degrees of freedom for a t-distribution approximation to distributions of means. The default is the design degrees of freedom. Specify df=Inf to use a Normal distribution (eg, for compatibility).

When the standard error is requested, it is estimated by dividing the confidence interval length by the number of standard errors in a t confidence interval with the specified alpha. For example, with alpha=0.05 and df=Inf the standard error is estimated as the confidence interval length divided by 2*1.96.

References

Dorfman A, Valliant R (1993) Quantile variance estimators in complex surveys. Proceedings of the ASA Survey Research Methods Section. 1993: 866-871

Francisco CA, Fuller WA (1986) Estimation of the distribution function with a complex survey. Technical Report, Iowa State University.

Hyndman, R. J. and Fan, Y. (1996) Sample quantiles in statistical packages, The American Statistician 50, 361-365.

Shah BV, Vaish AK (2006) Confidence Intervals for Quantile Estimation from Complex Survey Data. Proceedings of the Section on Survey Research Methods.

Woodruff RS (1952) Confidence intervals for medians and other position measures. JASA 57, 622-627.

Examples

Run this code

data(api)
## population
quantile(apipop$api00,c(.25,.5,.75))

## one-stage cluster sample
dclus1<-svydesign(id=~dnum, weights=~pw, data=apiclus1, fpc=~fpc)
rclus1<-as.svrepdesign(dclus1)
bclus1<-as.svrepdesign(dclus1,type="boot")


svyquantile(~api00, dclus1, c(.25,.5,.75))
svyquantile(~api00, dclus1, c(.25,.5,.75),interval.type="beta")

svyquantile(~api00, rclus1, c(.25,.5,.75))
svyquantile(~api00, rclus1, c(.25,.5,.75),interval.type="quantile")
svyquantile(~api00, bclus1, c(.25,.5,.75),interval.type="quantile")

svyquantile(~api00+ell, dclus1, c(.25,.5,.75), qrule="math")
svyquantile(~api00+ell, dclus1, c(.25,.5,.75), qrule="school")
svyquantile(~api00+ell, dclus1, c(.25,.5,.75), qrule="hf8")

Run the code above in your browser using DataLab