mixed.sdf: EdSurvey Mixed-Effects Model

Description

Fits a linear or logistic weighted mixed-effects model.

Usage

mixed.sdf(formula, data, weightVars = NULL,
  weightTransformation = TRUE, recode = NULL,
  defaultConditions = TRUE, tolerance = 0.01, nQuad = NULL,
  verbose = 0, family = NULL, centerGroup = NULL,
  centerGrand = NULL, fast = FALSE, ...)

Arguments

formula

a formula for the multilevel regression or mixed model. See Examples and the vignette titled Methods Used for Estimating Mixed-Effects Models in EdSurvey for more details on how to specify a mixed model. If y is left blank, the default subject scale or subscale variable will be used. (You can find the default using showPlausibleValues.) If y is a variable for a subject scale or subscale (one of the names shown by showPlausibleValues), then that subject scale or subscale is used.

For logistic models, we recommend using the I() function to define the level used for success. (See Examples.)

data

an edsurvey.data.frame or a light.edsurvey.data.frame

weightVars

character vector indicating weight variables for corresponding levels to use. The weightVar must be the weights for the edsurvey.data.frame. The weight variables must be in the order of level (from lowest to highest level).

weightTransformation

a logical value to indicate whether the function should standardize weights before using it in the multilevel model. If set to TRUE, the function will look up standard weight transformation methods often used for a specific survey. Weight transformation can be found in the vignette titled Methods Used for Estimating Mixed-Effects Models in EdSurvey. If set to FALSE or if the survey of the specified data does not have a standard weight transformation method, raw weights will be used.

recode

a list of lists to recode variables. Defaults to NULL. Can be set as recode=list(var1 = list(from= c("a", "b", "c"), to= "d")). See Examples in lm.sdf.

defaultConditions

a logical value. When set to the default value of TRUE, uses the default conditions stored in an edsurvey.data.frame to subset the data. Use print on an edsurvey.data.frame to see the default conditions.

tolerance

a numeric value to indicate how accurate the result is. See Details for more information. Defaults to 0.01.

nQuad

an integer to indicate the number of quadrature points in adaptive quadrature process. See documentation of mix for more details on how nQuad affects the estimation. If no nQuad is set, function will start with nQuad = 5 and increase until a stable result is reached. See details for further discussion.

verbose

an integer; when set to 1, it will print out the brief progress of the function mix.sdf. Users can use these traced messages for further diagnosis. When set to 2, it will print out the detailed progress, including temporary estimates during the optimization. Defaults to 0, which will run the function without output.

family

an element of family class; optionally used to specify generalized linear mixed models. Defaults to NULL, which runs mixed linear regression models. Another family option is binomial(link="logit") to run binomial mixed models.

centerGroup

a list in which the name of each element is the name of the aggregation level, and the element is a formula of variable names to be group mean centered. For example, to group mean center gender and age within the group student: list("student"= ~gender+age). Defaults to NULL, which means predictors are not adjusted by group centering. See Examples in mix.

centerGrand

a formula of variable names to be grand mean centered. For example, to center the variable education by overall mean of education: ~education. Defaults to NULL, which means predictors are not adjusted by grand centering.

fast

a logical value; when set to TRUE, use c++ function for faster result. Defaults to FALSE.

...

other potential arguments to be used in mix

Value

A mixedSdfResults object with the following elements:

call

the original call used in mixed.sdf

formula

the formula used to fit the model

coef

the estimates of the coefficients

the standard error estimates of the coefficients

vars

variance components of the model

levels

the number of levels in the model

ICC

the intraclass correlation coefficient of the model

npv

the number of plausible values

ngroups

a data.frame that includes number of observations for each group

If the formula does not involve plausible values, the function will return the following additional elements:

lnlf

the likelihood function

lnl

the log-likelihood of the model

If the formula involves plausible values, the function will return the following additional elements:

Vimp

the estimated variance from uncertainty in the scores

Vjrr

the estimated variance from sampling

Details

If users do not specify the nQuad argument, the functions will use the tolerance argument to repeatedly run the mixed model and increment nQuad (starting at 5) until the percentage of difference between the log-likelihood of the new model and the old model is smaller than tolerance. If users provide nQuad, selecting a smaller value of nQuad can save processing time; however, it is recommended that users try incrementing nQuad to check whether the result is stable.

Note that if the outcome variable has plausible values, the previous setting will be applied to the estimation of all plausible values.

References

Rabe-Hesketh, S., & Skrondal, A. (2006). Multilevel modelling of complex survey data. Journal of the Royal Statistical Society: Series A (Statistics in Society), 169(4), 805--827.

Examples

Run this code

# NOT RUN {
# read in the example data (generated, not real student data)
sdf <- readNAEP(system.file("extdata/data", "M36NT2PM.dat", package = "NAEPprimer"))

# subset to a smaller sample
sdf_subset <- subset(sdf, scrpsu < 500)

# fast is an argument from WeMix::mix that allows the function to run faster using c++
m1 <- mixed.sdf(composite ~ dsex + b017451 + (1|scrpsu), data=sdf_subset,
                weightVar = c('origwt', 'srwt01'),
                fast=TRUE, verbose=1)
summary(m1)

# Run multilevel logistic regression model
# nQuad is specified to be 7, which means 
# the function will use 7 quadrature points for the integration
m2 <- mixed.sdf(I(composite >= 214) ~ (1|scrpsu), 
                data=sdf_subset, family = binomial(link="logit"),
                weightVar = c('origwt', 'srwt01'), 
                nQuad = 7, verbose=1)
summary(m2)
# }

Run the code above in your browser using DataLab