lm.sdf: EdSurvey Linear Models

Description

Fits a linear model that uses weights and variance estimates appropriate for the data.

Usage

lm.sdf(formula, data, weightVar = NULL, relevels = list(),
              varMethod = c("jackknife", "Taylor"), jrrIMax = 1,
              omittedLevels = TRUE, defaultConditions = TRUE, recode = NULL,
              returnVarEstInputs = FALSE, returnNumberOfPSU = FALSE,
              standardizeWithSamplingVar = FALSE)

Arguments

formula

a formula for the linear model. See lm. If y is left blank, the default subject scale or subscale variable will be used. (You can find the default using showPlausibleValues.) If y is a variable for a subject scale or subscale (one of the names shown by showPlausibleValues), then that subject scale or subscale is used.

data

an edsurvey.data.frame, a light.edsurvey.data.frame, or an edsurvey.data.frame.list

weightVar

a character indicating the weight variable to use (see Details). The weightVar must be one of the weights for the edsurvey.data.frame. If NULL, it uses the default for the edsurvey.data.frame.

relevels

a list; used when the user wants to change the contrasts from the default treatment contrasts to the treatment contrasts with a chosen omitted group. The name of each element should be the variable name, and the value should be the group to be omitted.

varMethod

a character set to “jackknife” or “Taylor” that indicates the variance estimation method to be used. See Details.

jrrIMax

when using the jackknife variance estimation method, the $V_{jrr}$ term (see Details) can be estimated with any positive number of plausible values and is estimated on the lower of the number of available plausible values and jrrIMax. When jrrIMax is set to Inf, all plausible values will be used. Higher values of jrrIMax lead to longer computing times and more accurate variance estimates.

omittedLevels

a logical value. When set to the default value of TRUE, drops those levels of all factor variables that are specified in an edsurvey.data.frame. Use print on an edsurvey.data.frame to see the omitted levels.

defaultConditions

a logical value. When set to the default value of TRUE, uses the default conditions stored in an edsurvey.data.frame to subset the data. Use print on an edsurvey.data.frame to see the default conditions.

recode

a list of lists to recode variables. Defaults to NULL. Can be set as recode=list(var1 = list(from= c("a", "b", "c"), to= "d")). See Examples.

returnVarEstInputs

a logical value set to TRUE to return the inputs to the jackknife and imputation variance estimates. This is intended to allow for the computation of covariances between estimates.

returnNumberOfPSU

a logical value set to TRUE to return the number of primary sampling units (PSU).

standardizeWithSamplingVar

a logical value indicating if the standardized coefficients should have the variance of the regressors and outcome measured with sampling variance. Defaults to FALSE.

Value

An edsurvey.lm with the following elements:

call

the function call

formula

the formula used to fit the model

coef

the estimates of the coefficients

the standard error estimates of the coefficients

Vimp

the estimated variance from uncertainty in the scores (plausible value variables)

Vjrr

the estimated variance from sampling

the number of plausible values

varm

the variance estimates under the various plausible values

coefm

the values of the coefficients under the various plausible values

coefmat

the coefficient matrix (typically produced by the summary of a model)

r.squared

the coefficient of determination

weight

the name of the weight variable

npv

the number of plausible values

jrrIMax

the jrrIMax value used in computation

njk

the number of jackknife replicates used; set to NA when Taylor series variance estimates are used

varMethod

one of Taylor series or jackknife

residuals

residuals from the average regression coefficients

PV.residuals

residuals from the by plausible value coefficients

PV.fitted.values

fitted values from the by plausible value coefficients

imputation variance covariance matrix, before multiplication by (M+1)/M

sampling variance covariance matrix

rbar

average relative increase in variance; see van Buuren (2012, eq. 2.29)

nPSU

number of PSUs used in calculation

number of rows on edsurvey.data.frame before any conditions were applied

nUsed

number of observations with valid data and weights larger than zero

data

data used for the computation

Xstdev

standard deviations of regressors, used for computing standardized regression coefficients when standardizeWithSamplingVar is set to FALSE (the default)

varSummary

the result of running summary2 (unweighted) on each variable in the regression

varEstInputs

when returnVarEstInputs is TRUE, this element is returned. These are used for calculating covariances with varEstToCov.

standardizeWithSamplingVar

when standardizeWithSamplingVar is set to TRUE this element is returned. Calculates the standard deviation of the standardized regression coefficients like any other variable.

Details

This function implements an estimator that correctly handles left-hand side variables that are either numeric or plausible values and allows for survey sampling weights and estimates variances using the jackknife replication method. The vignette titled Statistics describes estimation of the reported statistics.

Regardless of the variance estimation, the coefficients are estimated using the sample weights according to the sections “Estimation of Weighted Means When Plausible Values Are Not Present” or “Estimation of Weighted Means When Plausible Values Are Present,” depending on if there are assessment variables or variables with plausible values in them.

How the standard errors of the coefficients are estimated depends on the value of varMethod and the presence of plausible values (assessment variables), But once it is obtained, the t statistic is given by $$t=\frac{\hat{\beta}}{\sqrt{\mathrm{var}(\hat{\beta})}}$$ where $ \hat{\beta} $ is the estimated coefficient and $\mathrm{var}(\hat{\beta})$ is the variance of that estimate.

The coefficient of determination (R-squared value) is similarly estimated by finding the average R-squared using the average across the plausible values.

Variance estimation of coefficients

All variance estimation methods are shown in the vignette titled Statistics. When varMethod is set to jackknife and the predicted value does not have plausible values, the variance of the coefficients is estimated according to the section “Estimation of Standard Errors of Weighted Means When Plausible Values Are Not Present, Using the Jackknife Method.”

When plausible values are present and varMethod is jackknife, the variance of the coefficients is estimated according to the section “Estimation of Standard Errors of Weighted Means When Plausible Values Are Present, Using the Jackknife Method.”

When plausible values are not present and varMethod is Taylor, the variance of the coefficients is estimated according to the section “Estimation of Standard Errors of Weighted Means When Plausible Values Are Not Present, Using the Taylor Series Method.”

When plausible values are present and varMethod is “Taylor,” the variance of the coefficients is estimated according to the section “Estimation of Standard Errors of Weighted Means When Plausible Values Are Present, Using the Taylor Series Method.”

References

Binder, D. A. (1983). On the variances of asymptotically normal estimators from complex surveys. International Statistical Review, 51(3), 279--292.

Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. New York, NY: Wiley.

van Buuren, S. (2012). Flexible imputation of missing data. New York, NY: CRC Press.

Weisberg, S. (1985). Applied linear regression (2nd ed.). New York, NY: Wiley.

Examples

Run this code

# NOT RUN {
# read in the example data (generated, not real student data)
sdf <- readNAEP(system.file("extdata/data", "M36NT2PM.dat", package = "NAEPprimer"))

# By default uses jackknife variance method using replicate weights
lm1 <- lm.sdf(composite ~ dsex + b017451, data=sdf)
lm1

# for more detailed results use summary
summary(lm1)

# to specify a variance method, use varMethod
lm2 <- lm.sdf(composite ~ dsex + b017451, data=sdf, varMethod="Taylor")
lm2
summary(lm2)

# Use relevel to set a new omitted category
lm3 <- lm.sdf(composite ~ dsex + b017451, data=sdf, relevels=list(dsex="Female"))
summary(lm3)

# Use recode to change values for specified variables
lm4 <- lm.sdf(composite ~ dsex + b017451, data=sdf,
              recode=list(b017451=list(from=c("Never or hardly ever",
                                              "Once every few weeks",
                                              "About once a week"),
                                       to=c("Infrequently")),
                          b017451=list(from=c("2 or 3 times a week","Every day"),
                                       to=c("Frequently"))))
# Note: "Infrequently" is the dropped level for the recoded b017451
summary(lm4)
# }

Run the code above in your browser using DataLab