Learn R Programming

misty (version 0.6.7)

multilevel.descript: Multilevel Descriptive Statistics for Two-Level and Three-Level Data

Description

This function computes descriptive statistics for two-level and three-level multilevel data, e.g. average cluster size, variance components, intraclass correlation coefficient, design effect, and effective sample size.

Usage

multilevel.descript(..., data = NULL, cluster, type = c("1a", "1b"),
                    method = c("aov", "lme4", "nlme"),
                    print = c("all", "var", "sd"), REML = TRUE, digits = 2,
                    icc.digits = 3, as.na = NULL, write = NULL, append = TRUE,
                    check = TRUE, output = TRUE)

Value

Returns an object of class misty.object, which is a list with following entries:

call

function call

type

type of analysis

data

data frame specified in ... including the cluster variable(s) specified in cluster

args

specification of function arguments

model.fit

fitted lavaan object (mod.fit)

result

list with result tables, i.e., no.obs for the number of observations, no.no.miss for the number of missing value, no.cluster.l2 and no.cluster.l3 for the number of clusters at Level 2 and/or Level 3, m.cluster.size.l2 and m.cluster.size.l3 for the average cluster size at Level 2 and/or Level 3, sd.cluster.size.l2 and sd.cluster.size.l3 for the standard deviation of the cluster size at Level 2 and/or Level 3, min.cluster.size.l2 min.cluster.size.l3 for the minimum cluster size at Level 2 and/or Level 3, max.cluster.size.l2 max.cluster.size.l3 for the maximum cluster size at Level 2 and/or Level 3, mean.x for the intercept of the multilevel model, var.r for the variance within clusters, var.u for the variance between Level 2 clusters, var.b for the variance between Level 3 clusters, icc1.l2 and icc1.l3 for ICC(1) at Level 2 and/or Level 3, icc2.l2 and icc2.l3 for ICC(2) at Level 2 and/or Level 3, deff for the design effect, deff.sqrt for the square root of the design effect, n.effect for the effective sample size

Arguments

...

a numeric vector, matrix, or data frame. Alternatively, an expression indicating the variable names in data e.g., multilevel.descript(x1, x2, x3, data = dat, cluster = "cluster"). Note that the operators ., +, -, ~, :, ::, and ! can also be used to select variables, see 'Details' in the df.subset function.

data

a data frame when specifying one or more variables in the argument .... Note that the argument is NULL when specifying a numeric vector, matrix, or data frame for the argument ....

cluster

a character string indicating the name of the cluster variable in ... or data for two-level data, a character vector indicating the names of the cluster variables in ... for three-level data, or a vector or data frame representing the nested grouping structure (i.e., group or cluster variables). Alternatively, a character string or character vector indicating the variable name(s) of the cluster variable(s) in data. Note that the cluster variable at Level 3 come first in a three-level model, i.e., cluster = c("level3", "level2").

type

a character string indicating the type of intraclass correlation coefficient, i.e., type = "1a" (default) for ICC(1) representing the proportion of variance at Level 2 and Level 3, type = "1b" representing an estimate of the expected correlation between two randomly chosen elements in the same group when specifying a three-level model (i.e., two cluster variables). See 'Details' in the multilevel.icc function for the formula used in this function.

method

a character string indicating the method used to estimate intraclass correlation coefficients, i.e., "aov" ICC estimated using the aov function, "lme4" (default) ICC estimated using the lmer function in the lme4 package, "nlme" ICC estimated using the lme function in the nlme package.

print

a character string or character vector indicating which results to show on the console, i.e. "all" for variances and standard deviations, "var" (default) for variances, or "sd" for standard deviations within and between clusters.

REML

logical: if TRUE (default), restricted maximum likelihood is used to estimate the null model when using the lmer() function in the lme4 package or the lme() function in the nlme package.

digits

an integer value indicating the number of decimal places to be used.

icc.digits

an integer indicating the number of decimal places to be used for displaying intraclass correlation coefficients.

as.na

a numeric vector indicating user-defined missing values, i.e. these values are converted to NA before conducting the analysis. Note that as.na() function is only applied to ... but not to cluster.

write

a character string naming a file for writing the output into either a text file with file extension ".txt" (e.g., "Output.txt") or Excel file with file extension ".xlsx" (e.g., "Output.xlsx"). If the file name does not contain any file extension, an Excel file will be written.

append

logical: if TRUE (default), output will be appended to an existing text file with extension .txt specified in write, if FALSE existing text file will be overwritten.

check

logical: if TRUE (default), argument specification is checked.

output

logical: if TRUE (default), output is shown on the console.

Author

Takuya Yanagida takuya.yanagida@univie.ac.at

Details

Two-Level Model

In a two-level model, the intraclass correlation coefficients, design effect, and the effective sample size are computed based on the random intercept-only model:

$$Y_{ij} = \gamma_{00} + u_{0j} + r_{ij}$$

where the variance in \(Y\) is decomposed into two independent components: \(\sigma^2_{u_{0}}\), which represents the variance at Level 2, and \(\sigma^2_{r}\), which represents the variance at Level 1 (Hox et al., 2018). For the computation of the intraclass correlation coefficients, see 'Details' in the multilevel.icc function. The design effect represents the effect of cluster sampling on the variance of parameter estimation and is defined by the equation

$$deff = (\frac{SE_{Cluster}}{SE_{Simple}})^2 = 1 + \rho(J - 1)$$

where \(SE_{Cluster}\) is the standard error under cluster sampling, \(SE_{Simple}\) is the standard error under simple random sampling, \(\rho\) is the intraclass correlation coefficient, ICC(1), and \(J\) is the average cluster size. The effective sample size is defined by the equation:

$$N_{effective} = \frac{N{total}}{deff}$$

The effective sample size \(N_{effective}\) represents the equivalent total sample size that we should use in estimating the standard error (Snijders & Bosker, 2012).

Three-Level Model

In a three-level model, the intraclass correlation coefficients, design effect, and the effective sample size are computed based on the random intercept-only model:

$$Y_{ijk} = \gamma_{000} + v_{0k} + u_{0jk} + r_{ijk}$$

where the variance in \(Y\) is decomposed into three independent components: \(\sigma^2_{v_{0}}\), which represents the variance at Level 3, \(\sigma^2_{u_{0}}\), which represents the variance at Level 2, and \(\sigma^2_{r}\), which represents the variance at Level 1 (Hox et al., 2018). For the computation of the intraclass correlation coefficients, see 'Details' in the multilevel.icc function. The design effect represents the effect of cluster sampling on the variance of parameter estimation and is defined by the equation

$$deff = (\frac{SE_{Cluster}}{SE_{Simple}})^2 = 1 + \rho_{L2}(J - 1) + \rho_{L3}(JK - 1)$$

where \(\rho_{L2}\) is the ICC(1) at Level 2, \(\rho_{L3}\) is the ICC(1) at Level 3, \(J\) is the average cluster size at Level 2, and \(K\) is the average cluster size at Level 3.

References

Hox, J., Moerbeek, M., & van de Schoot, R. (2018). Multilevel analysis: Techniques and applications (3rd. ed.). Routledge.

Snijders, T. A. B., & Bosker, R. J. (2012). Multilevel analysis: An introduction to basic and advanced multilevel modeling (2nd ed.). Sage Publishers.

See Also

write.result, multilevel.icc, descript

Examples

Run this code
if (FALSE) {
# Load data set "Demo.twolevel" in the lavaan package
data("Demo.twolevel", package = "lavaan")

#----------------------------------------------------------------------------
# Two-Level Data

#..........
# Cluster variable specification

# Example 1a: Cluster variable 'cluster'
multilevel.descript(Demo.twolevel[, c("y1", "cluster")], cluster = "cluster")

# Example 1b: Cluster variable 'cluster' not in '...'
multilevel.descript(Demo.twolevel$y1, cluster = Demo.twolevel$cluster)

# Example 1c: Alternative specification using the 'data' argument
multilevel.descript(y1, data = Demo.twolevel, cluster = "cluster")

#---------------------------

# Example 2: Multilevel descriptive statistics for 'y1'
multilevel.descript(Demo.twolevel$y1, cluster = Demo.twolevel$cluster)

# Example 3: Multilevel descriptive statistics, print variance and standard deviation
multilevel.descript(Demo.twolevel$y1, cluster = Demo.twolevel$cluster, print = "all")

# Example 4: Multilevel descriptive statistics, print ICC with 5 digits
multilevel.descript(Demo.twolevel$y1, cluster = Demo.twolevel$cluster, icc.digits = 5)

# Example 5: Multilevel descriptive statistics
# use lme() function in the nlme package to estimate ICC
multilevel.descript(Demo.twolevel$y1, cluster = Demo.twolevel$cluster, method = "nlme")

# Example 6a: Multilevel descriptive statistics for 'y1', 'y2', 'y3', 'w1', and 'w2'
multilevel.descript(Demo.twolevel[, c("y1", "y2", "y3", "w1", "w2")],
                      cluster = Demo.twolevel$cluster)

# Example 6b: Alternative specification using the 'data' argument
multilevel.descript(y1:y3, w1, w2, data = Demo.twolevel, cluster = "cluster")

#----------------------------------------------------------------------------
# Three-Level Data

# Create arbitrary three-level data
Demo.threelevel <- data.frame(Demo.twolevel, cluster2 = Demo.twolevel$cluster,
                                             cluster3 = rep(1:10, each = 250))

#..........
# Cluster variable specification

# Example 7a: Cluster variables 'cluster' in '...'
multilevel.descript(Demo.threelevel[, c("y1", "cluster3", "cluster2")],
                    cluster = c("cluster3", "cluster2"))

# Example 7b: Cluster variables 'cluster' not in '...'
multilevel.descript(Demo.threelevel$y1, cluster = Demo.threelevel[, c("cluster3", "cluster2")])

# Example 7c: Alternative specification using the 'data' argument
multilevel.descript(y1, data = Demo.threelevel, cluster = c("cluster3", "cluster2"))

#----------------------------------------------------------------------------

# Example 8: Multilevel descriptive statistics for 'y1', 'y2', 'y3', 'w1', and 'w2'
multilevel.descript(y1:y3, w1, w2, data = Demo.threelevel, cluster = c("cluster3", "cluster2"))

#----------------------------------------------------------------------------
# Write Results

# Example 9a: Write results into a Excel file
multilevel.descript(Demo.twolevel[, c("y1", "y2", "y3", "w1", "w2")],
                    cluster = Demo.twolevel$cluster, write = "Multilevel_Descript.txt")

# Example 9b: Write results into a Excel file
multilevel.descript(Demo.twolevel[, c("y1", "y2", "y3", "w1", "w2")],
                    cluster = Demo.twolevel$cluster, write = "Multilevel_Descript.xlsx")

result <- multilevel.descript(Demo.twolevel[, c("y1", "y2", "y3", "w1", "w2")],
                              cluster = Demo.twolevel$cluster, output = FALSE)
write.result(result, "Multilevel_Descript.xlsx")
}

Run the code above in your browser using DataLab