mvrlm.sdf: Multivariate Regression

Description

Fits a multivariate linear model that uses weights and variance estimates appropriate for the edsurvey.data.frame.

Usage

mvrlm.sdf(formula, data, weightVar = NULL, relevels = list(),
  jrrIMax = 1, omittedLevels = TRUE, defaultConditions = TRUE,
  recode = NULL, returnVarEstInputs = FALSE, estMethod = "OLS")

Arguments

formula

a Formula for the linear model. See Formula; left hand side (LHS) variables are separated with vertical pipes (|). See Examples.

data

an edsurvey.data.frame or edsurvey.data.frame.list

weightVar

character indicating the weight variable to use (see Details). The weightVar must be one of the weights for the edsurvey.data.frame. If NULL, uses the default for the edsurvey.data.frame.

relevels

a list. Used when the user wants to change the contrasts from the default treatment contrasts to treatment contrasts with a chosen omitted group. To do this, the user puts an element on the list named the same name as a variable to change contrasts on and then makes the value for that list element equal to the value that should be the omitted group. (See Examples.)

jrrIMax

when using the jackknife variance estimation method, the \(V_{jrr}\) term (see Details) can be estimated with any positive number of plausible values and is estimated on the first of the lower of the number of available plausible values and jrrIMax. When jrrIMax is set to Inf, all of the plausible values will be used. Higher values of jrrIMax lead to longer computing times and more accurate variance estimates.

omittedLevels

a logical value. When set to the default value of TRUE, drops those levels of all factor variables that are specified in edsurvey.data.frame. Use print on an edsurvey.data.frame to see the omitted levels.

defaultConditions

a logical value. When set to the default value of TRUE, uses the default conditions stored in edsurvey.data.frame to subset the data. Use print on an edsurvey.data.frame to see the default conditions.

recode

a list of lists to recode variables. Defaults to NULL. Can be set as recode = list(var1= list(from=c("a","b","c"), to ="d")). See Examples.

returnVarEstInputs

a logical value. Set to TRUE to return the inputs to the jackknife and imputation variance estimates. This is intended to allow for computation of covariances between estimates.

estMethod

a character value indicating which estimation method to use. Default is 'OLS', other option is 'GLS'.

Value

An edsurvey.mvrlm with elements:

call

the function call

formula

the formula used to fit the model

coef

the estimates of the coefficients

the standard error estimates of the coefficients

Vimp

the estimated variance due to uncertainty in the scores (plausible values variables)

Vjrr

the estimated variance due to sampling

the number of plausible values

varm

the variance estimates under the various plausible values

coefm

the values of the coefficients under the various plausible values

coefmat

the coefficient matrix (typically produced by the summary of a model)

r.squared

the coefficient of determination

weight

the name of the weight variable

npv

number of plausible values

njk

the number of jackknife replicates used

varEstInputs

When returnVarEstInputs is TRUE, this element is returned. These are used for calculating covariances with varEstToCov.

residuals

residuals for each of the PV models

fitted.values

model fitted values

residCov

residual covariance matrix for dependent variables

residPV

residuals for each dependent variables

inputs

coefficient estimation input matrices

full data n

nUsed

n used for model

imputation variance-covariance matrix, before multiplication by (M+1)/M

sampling variance-covariance matrix

Details

This function implements an estimator that correctly handles multiple left hand side variables that are either numeric or plausible values, allows for survey sampling weights and estimates variances using the jackknife replication method. The vignette titled Statistics describes estimation of the reported statistics.

The coefficients are estimated using the sample weights according to the section “Estimation of Weighted Means When Plausible Values Are Not Present” or the section “Estimation of Weighted Means When Plausible Values Are Present,” depending on if there are assessment variables or variables with plausible values in them.

The coefficient of determination (R-squared value) is similarly estimated by finding the average R-squared using the sample weights for each set of plausible values.

Variance estimation of coefficients

All variance estimation methods are shown in the vignette titled Statistics.

When the predicted value does not have plausible values, the variance of the coefficients is estimated according to the section “Estimation of Standard Errors of Weighted Means When Plausible Values Are Not Present, Using the Jackknife Method.”

When plausible values are present, the variance of the coefficients is estimated according to the section “Estimation of Standard Errors of Weighted Means When Plausible Values Are Present, Using the Jackknife Method.”

For more information on the specifics of multivariate regression, see the vignette titled Multivariate Regression.

Examples

Run this code

# NOT RUN {
# read in the example data (generated, not real student data)
sdf <- readNAEP(system.file("extdata/data", "M36NT2PM.dat", package = "NAEPprimer"))

# Use | symbol to separate dependent variables in the left hand side of formula
mvrlm.fit <- mvrlm.sdf(algebra | geometry ~ dsex + m072801, jrrIMax = 5, data = sdf)

# print method returns coefficients, as does coef method
mvrlm.fit
coef(mvrlm.fit)

# for more detailed results use summary:
summary(mvrlm.fit)

# Details of model can also be accessed through components of the returned object, for example:

# coefficients (1 column per dependent variable)
mvrlm.fit$coef
# coefficient table with standard errors and p-values (1 table per dependent variable)
mvrlm.fit$coefmat
# R-squared values (1 per dependent variable)
mvrlm.fit$r.squared
# Residual covariance matrix
mvrlm.fit$residCov

# Model residuals and other details are available as well

# show the structure of the residuals objects
str(mvrlm.fit$residuals)
str(mvrlm.fit$residPV)

# dependent variables can have plausible values or not (or a combination)

mvrlm.fit <- mvrlm.sdf(composite | mrps22 ~ dsex + m072801, data = sdf, jrrIMax = 5)
summary(mvrlm.fit)

mvrlm.fit <- mvrlm.sdf(algebra | geometry | measurement ~ dsex + m072801, data = sdf, jrrIMax = 5)
summary(mvrlm.fit)

mvrlm.fit <- mvrlm.sdf(mrps51 | mrps22 ~ dsex + m072801, data = sdf, jrrIMax = 5)
summary(mvrlm.fit)

# Hypotheses about coefficient restrictions can also be tested using the Wald test

mvr <- mvrlm.sdf(algebra | geometry ~ dsex + m072801, data = sdf)

hypothesis <- c("geometry_dsexFemale = 0", "algebra_dsexFemale = 0")

# test statistics based on the F and Chi-squared distribution are available
linearHypothesis(mvr, hypothesis = hypothesis, test = "F")
linearHypothesis(mvr, hypothesis = hypothesis, test = "Chisq")

# }

Run the code above in your browser using DataLab