mplus.lca: Mplus Model Specification for Latent Class Analysis

Description

This function writes Mplus input files for conducting latent class analysis (LCA) for continuous, count, ordered categorical, and unordered categorical variables. LCA with continuous indicator variables are based on six different variance-covariance structures, while LCA for all other variable types assume local independence. By default, the function conducts LCA with continuous variables and creates folders in the current working directory for each of the six sets of analysis, writes Mplus input files for conducting LCA with k = 1 to k = 6 classes into these folders, and writes the matrix or data frame specified in x into a Mplus data file in the current working directory. Optionally, all models can be estimated by setting the argument mplus.run to TRUE.

Usage

mplus.lca(x, ind = NULL,
          type = c("continuous", "count", "categorical", "nominal"), cluster = NULL,
          folder = c("A_Invariant-Theta_Diagonal-Sigma",
                     "B_Varying-Theta_Diagonal-Sigma",
                     "C_Invariant-Theta_Invariant-Unrestrictred-Sigma",
                     "D_Invariant-Theta_Varying-Unrestricted-Sigma",
                     "E_Varying-Theta_Invariant-Unrestricted-Sigma",
                     "F_Varying-Theta_Varying-Unrestricted-Sigma"),
          file = "Data_LCA.dat", write = c("all", "folder", "data", "input"),
          useobservations = NULL, missing = -99, classes = 6, estimator = "MLR",
          starts = c(100, 50), stiterations = 10, lrtbootstrap = 1000,
          lrtstarts = c(0, 0, 100, 50), processors = c(8, 8),
         output = c("all", "SVALUES", "CINTERVAL", "TECH7", "TECH8", "TECH11", "TECH14"),
          replace.inp = FALSE, mplus.run = FALSE, Mplus = "Mplus",
          replace.out = c("always", "never", "modified"), check = TRUE)

Value

Returns an object of class misty.object, which is a list with following entries:

call: function call
type: type of analysis
x: matrix or data frame specified in the argument x
args: specification of function arguments
result: list with six entries for each of the variance-covariance structures and Mplus inputs based on different number of profiles in case of continuous indicators or list of Mplus inputs based on different number of classes in case of count, ordered or unordered categorical indicators.

Arguments

x: a matrix or data frame. Note that all variable names must be no longer than 8 character.
ind: a character vector indicating the variables names of the latent class indicators in x.
type: a character string indicating the variable type of the latent class indicators, i.e., "continuous" (default) for continuous variables, "count" for count variables, "categorical" for binary or ordered categorical variables, and "nominal" for unordered categorical variables. Note that it is not possible to mix different variable types in the analysis.
cluster: a character string indicating the cluster variable in the matrix or data frame specified in x representing the nested grouping structure for computing cluster-robust standard errors. Note that specifying a cluster variables does not have any effect on the information criteria, but on the Vuong-Lo-Mendell-Rubin likelihood ratio test of model fit.
folder: a character vector with six character strings for specifying the names of the six folder representing different variance-covariance structures for conducting LCA with continuous indicator variables. There is only one folder for LCA with all other variable types which is called "LCA_1-x_Classes" with x being the maximum number of classes specified in the argument classes.
file: a character string naming the Mplus data file with or without the file extension '.dat', e.g., "Data_LCA.dat" (default) or "Data_LCA".
write: a character string or character vector indicating whether to create the six folders specified in the argument folder ("folder"), to write the matrix or data frame specified in x into a Mplus data file ("data"), and write the Mplus input files into the six folders specified in the argument folder ("input"). By default, the function creates the folders, writes the Mplus data file, and writes the Mplus input files into the folders.
useobservations: a character string indicating the conditional statement to select observations.
missing: a numeric value or character string representing missing values (NA) in the Mplus data set. This values or character string will be specified in the Mplus input file as MISSING IS ALL(missing). By default, -99 is used to represent missing values.
classes: an integer value specifying the maximum number of classes for the latent class analysis. By default, LCA with a maximum of 6 classes is specified (i.e., k = 1 to k = 6).
estimator: a character string for specifying the ESTIMATOR option in Mplus. By default, the estimator "MLR" is used.
starts: a vector with two integer values for specifying the STARTS option in Mplus. The first number represents the number of random sets of starting values to generate in the initial stage and the second number represents the optimizations to use in the final stage. By default, 500 random sets of starting values are generated and 100 optimizations are carried out in the final stage.
stiterations: an integer value specifying the STITERATIONS option in Mplus. The numeric value represents the maximum number of iterations allowed in the initial stage. By default, 50 iterations are requested.
lrtbootstrap: an integer value for specifying the LRTBOOTSTRAP option in Mplus when requesting a parametric bootstrapped likelihood ratio test (i.e., output = "TECH14"). The value represents the number of bootstrap draws to be used in estimating the p-value of the parametric bootstrapped likelihood ratio test. By default, 1000 bootstrap draws are requested.
lrtstarts: a vector with four integer values for specifying the LRTSTARTS option in Mplus when requesting a parametric bootstrapped likelihood ratio test (i.e., output = "TECH14"). The values specify the number of starting values to use in the initial stage and the number of optimizations to use in the final stage for the k - 1 and k classes model when the data generated by bootstrap draws are analyzed. By default, 0 random sets of starting values in the initial stage and 0 optimizations in the final stage are used for the k - 1 classes model and 100 random sets of starting values in the initial stage and 50 optimizations in the final stage are used for the k class model.
processors: a vector of one or two integer values for specifying the PROCESSORS option in Mplus. The values specifies the number of processors and threads to be used for parallel computing to increase computational speed. By default, 8 processors and threads are used for parallel computing.
output: a character string or character vector specifying the TECH options in the OUTPUT section in Mplus, i.e., SVALUES to request input statements that contain parameter estimates from the analysis, CINTERVAL to request confidence intervals, TECH7 to request sample statistics for each class using raw data weighted by the estimated posterior probabilities for each class, TECH8 to request the optimization history in estimating the model, TECH11 to request the Lo-Mendell-Rubin likelihood ratio test of model fit, and TECH14 to request a parametric bootstrapped likelihood ratio test. By default, SVALUES and TECH11 are requested. Note that TECH11 is only available for the MLR estimator.
replace.inp: logical: if TRUE, all existing input files in the folder specified in the argument folder are replaced.
mplus.run: logical: if TRUE, all models in the folders specified in the argument folder are estimated by using the mplus.run function in the R package misty.
Mplus: a character string for specifying the name or path of the Mplus executable to be used for running models. This covers situations where Mplus is not in the system's path, or where one wants to test different versions of the Mplus program. Note that there is no need to specify this argument for most users since it has intelligent defaults.
replace.out: a character string for specifying three settings, i.e., "always" to run all models regardless of whether an output file for the model exists, "never" to not run any model that has an existing output file, and "modified" (default) to only runs a model if the modified date for the input file is more recent than the output file modified date.
check: logical: if TRUE (default), argument specification is checked.

Author

Takuya Yanagida takuya.yanagida@univie.ac.at

Details

Latent class analysis (LCA) is a model-based clustering and classification method used to identify qualitatively different classes of observations which are unknown and must be inferred from the data. LCA can accommodate continuous, count, binary, ordered categorical, and unordered categorical indicators. LCA with continuous indicator variables are also known as latent profile analysis (LPA). In LPA, the within-profile variance-covariance structures represent different assumptions regarding the variance and covariance of the indicator variables both within and between latent profiles. As the best within-profile variance-covariance structure is not known a priori, all of the different structures must be investigated to identify the best model (Masyn, 2013). This function specifies six different variance-covariance structures labeled A to F (see Table 1 in Patterer et al, 2023):

Model A: The within-profile variance is constrained to be profile-invariant and covariances are constrained to be 0 in all profiles (i.e., equal variances across profiles and no covariances among indicator variables). This is the default setting in Mplus.
Model B: The within-profile variance is profile-varying and covariances are constrained to be 0 in all profiles (i.e., unequal variances across profiles and no covariances among indicator variables).
Model C: The within-profile variance is constrained to be profile-invariant and covariances are constrained to be equal in all profiles (i.e., equal variances and covariances across profiles).
Model D: The within-profile variance is constrained to be profile-invariant and covariances are profile-varying (i.e., equal variances across profiles and unequal covariances across profiles).
Model E: The within-profile variances are profile-varying and covariances are constrained to be equal in all profiles (i.e., unequal variances across profiles and equal covariances across profiles).
Model F: The within-class variance and covariances are both profile-varying (i.e., unequal variances and covariances across profiles).

References

Masyn, K. E. (2013). Latent class analysis and finite mixture modeling. In T. D. Little (Ed.), The Oxford handbook of quantitative methods: Statistical analysis (pp. 551–611). Oxford University Press.

Muthen, L. K., & Muthen, B. O. (1998-2017). Mplus User's Guide (8th ed.). Muthen & Muthen.

Patterer, A. S., Yanagida, T., Kühnel, J., & Korunka, C. (2023). Daily receiving and providing of social support at work: Identifying support exchange patterns in hierarchical data. Journal of Work and Organizational Psychology, 32(4), 489-505. https://doi.org/10.1080/1359432X.2023.2177537

Examples

Run this code

if (FALSE) {
# Load data set "HolzingerSwineford1939" in the lavaan package
data("HolzingerSwineford1939", package = "lavaan")

#-------------------------------------------------------------------------------
# Example 1: LCA with k = 1 to k = 8 profiles, continuous indicators
# Input statements that contain parameter estimates
# Vuong-Lo-Mendell-Rubin LRT and bootstrapped LRT
mplus.lca(HolzingerSwineford1939, ind = c("x1", "x2", "x3", "x4"),
          classes = 8, output = c("SVALUES", "TECH11", "TECH14"))

#-------------------------------------------------------------------------------
# Example 22: LCA with k = 1 to k = 6 profiles, ordered categorical indicators
# Select observations with ageyr <= 13
# Estimate all models in Mplus
mplus.lca(round(HolzingerSwineford1939[, -5]), ind = c("x1", "x2", "x3", "x4"),
          type = "categorical", useobservations = "ageyr <= 13",
          mplus.run = TRUE)
}

Run the code above in your browser using DataLab