percentile: EdSurvey Percentiles

Description

Calculates the percentiles of a numeric variable in an edsurvey.data.frame, a light.edsurvey.data.frame, or an edsurvey.data.frame.list.

Usage

percentile(variable, percentiles, data, weightVar = NULL, jrrIMax = 1,
  varMethod = c("jackknife", "Taylor"), alpha = 0.05,
  omittedLevels = TRUE, defaultConditions = TRUE, recode = NULL,
  returnVarEstInputs = FALSE, returnNumberOfPSU = FALSE)

Arguments

variable

the character name of the variable to percentiles computed, typically a subject scale or subscale

percentiles

a numeric vector of percentiles in the range 0 to 100 (inclusive)

data

an edsurvey.data.frame or an edsurvey.data.frame.list

weightVar

a character indicating the weight variable to use. (See Details.)

jrrIMax

a numeric value; when using the jackknife variance estimation method, the \(V_{jrr}\) term (see Details) can be estimated with any positive number of plausible values and is estimated on the lower of the number of available plausible values and jrrIMax. When jrrIMax is set to Inf, all plausible values will be used. Higher values of jrrIMax lead to longer computing times and more accurate variance estimates.

varMethod

a character set to jackknife or Taylor that indicates the variance estimation method used when constructing the confidence intervals. The jackknife variance estimation method is always used to calculate the standard error.

alpha

a numeric value between 0 and 1 indicating the confidence level. An alpha value of 0.05 would indicate a 95 percent confidence interval and is the default.

omittedLevels

a logical value. When set to the default value of TRUE, drops those levels of all factor variables that are specified in achievementVars and aggregatBy. Use print on an edsurvey.data.frame to see the omitted levels.

defaultConditions

a logical value. When set to the default value of TRUE, uses the default conditions stored in an edsurvey.data.frame to subset the data. Use print on an edsurvey.data.frame to see the default conditions.

recode

a list of lists to recode variables. Defaults to NULL. Can be set as recode=list(var1= list(from= c("a", "b", "c"), to= "d")). See Examples.

returnVarEstInputs

a logical value set to TRUE to return the inputs to the jackknife and imputation variance estimates. This is intended to allow for the computation of covariances between estimates.

returnNumberOfPSU

a logical value set to TRUE to return the number of primary sampling units (PSU)

Value

The return type depends on whether the class of the data argument is an edsurvey.data.frame or an edsurvey.data.frame.list.

the data argument is an edsurvey.data.frame

When the data argument is an edsurvey.data.frame, percentile returns an S3 object of class percentile. This is a data.frame with typical attributes (names, row.names, and class) and additional attributes as follows:

n0: number of rows on edsurvey.data.frame before any conditions were applied
nUsed: number of observations with valid data and weights larger than zero
nPSU: number of PSUs used in calculation
call: the call used to generate these results

The columns of the data.frame are as follows:

percentile: the percentile of this row
estimate: the estimated value of the percentile
se: the jackknife standard error of the estimated percentile
df: degrees of freedom
confInt.ci_lower: the lower bound of the confidence interval
confInt.ci_upper: the upper bound of the confidence interval
nsmall: number of units with more extreme results, averaged across plausible values

the data argument is an edsurvey.data.frame.list

When the data argument is an edsurvey.data.frame.list, percentile returns an S3 object of class percentileList. This is a data.frame with a call attribute. The columns in the data.frame are identical to those in the previous section, but there also are columns from the edsurvey.data.frame.list.

covs: A column for each column in the covs value of the edsurvey.data.frame.list. See Examples.

When returnVarEstInputs is TRUE, an attribute varEstInputs also is returned that includes the variance estimate inputs used for calculating covariances with varEstToCov.

Details

Percentiles, their standard errors, and confidence intervals are calculated according to the vignette titled Methods Used for Estimating Percentiles. Note that the standard errors and confidence intervals are based on separate formulas and assumptions.

The Taylor series variance estimation procedure is not relevant to percentiles because percentiles are not continuously differentiable.

References

Hyndman, R. J., & Fan, Y. (1996). Sample quantiles in statistical packages. American Statistician, 50, 361--365.

Examples

Run this code

# NOT RUN {
# read in the example data (generated, not real student data)
sdf <- readNAEP(system.file("extdata/data", "M36NT2PM.dat", package="NAEPprimer"))

# get the median of the composite
percentile("composite", 50, sdf)

# }
# NOT RUN {
# get several percentiles
percentile("composite", c(0,1,25,50,75,99,100), sdf)
# build an edsurvey.data.frame.list
sdfA <- subset(sdf, scrpsu %in% c(5,45,56))
sdfB <- subset(sdf, scrpsu %in% c(75,76,78))
sdfC <- subset(sdf, scrpsu %in% 100:200)
sdfD <- subset(sdf, scrpsu %in% 201:300)

sdfl <- edsurvey.data.frame.list(list(sdfA, sdfB, sdfC, sdfD),
                                 labels=c("A locations",
                                           "B locations",
                                           "C locations",
                                           "D locations"))
# this shows how these datasets will be described
sdfl$covs

percentile("composite", 50, sdfl)
percentile("composite", c(25, 50, 75), sdfl)
# }

Run the code above in your browser using DataLab