This function writes Mplus input files for conducting latent class analysis (LCA)
for continuous, count, ordered categorical, and unordered categorical variables.
LCA with continuous indicator variables are based on six different
variance-covariance structures, while LCA for all other variable types assume
local independence. By default, the function conducts LCA with continuous
variables and creates folders in the current working directory for each of the
six sets of analysis, writes Mplus input files for conducting LCA with
k = 1 to k = 6 classes into these folders, and writes the matrix
or data frame specified in x
into a Mplus data file in the current working
directory. Optionally, all models can be estimated by setting the argument
mplus.run
to TRUE
.
mplus.lca(x, ind = NULL,
type = c("continuous", "count", "categorical", "nominal"), cluster = NULL,
folder = c("A_Invariant-Theta_Diagonal-Sigma",
"B_Varying-Theta_Diagonal-Sigma",
"C_Invariant-Theta_Invariant-Unrestrictred-Sigma",
"D_Invariant-Theta_Varying-Unrestricted-Sigma",
"E_Varying-Theta_Invariant-Unrestricted-Sigma",
"F_Varying-Theta_Varying-Unrestricted-Sigma"),
file = "Data_LCA.dat", write = c("all", "folder", "data", "input"),
useobservations = NULL, missing = -99, classes = 6, estimator = "MLR",
starts = c(100, 50), stiterations = 10, lrtbootstrap = 1000,
lrtstarts = c(0, 0, 100, 50), processors = c(8, 8),
output = c("all", "SVALUES", "CINTERVAL", "TECH7", "TECH8", "TECH11", "TECH14"),
replace.inp = FALSE, mplus.run = FALSE, Mplus = "Mplus",
replace.out = c("always", "never", "modified"), check = TRUE)
Returns an object of class misty.object
, which is a list with following
entries:
call
function call
type
type of analysis
x
matrix or data frame specified in the argument x
args
specification of function arguments
result
list with six entries for each of the variance-covariance structures and Mplus inputs based on different number of profiles in case of continuous indicators or list of Mplus inputs based on different number of classes in case of count, ordered or unordered categorical indicators.
a matrix or data frame. Note that all variable names must be no longer than 8 character.
a character vector indicating the variables names of the
latent class indicators in x
.
a character string indicating the variable type of the
latent class indicators, i.e., "continuous"
(default)
for continuous variables, "count"
for count variables,
"categorical"
for binary or ordered categorical
variables, and "nominal"
for unordered categorical
variables. Note that it is not possible to mix different
variable types in the analysis.
a character string indicating the cluster variable in
the matrix or data frame specified in x
representing
the nested grouping structure for computing cluster-robust
standard errors. Note that specifying a cluster variables
does not have any effect on the information criteria,
but on the Vuong-Lo-Mendell-Rubin likelihood ratio test
of model fit.
a character vector with six character strings for specifying
the names of the six folder representing different
variance-covariance structures for conducting LCA with
continuous indicator variables. There is only one folder
for LCA with all other variable types which is called
"LCA_1-x_Classes"
with x
being the maximum number of classes
specified in the argument classes
.
a character string naming the Mplus data file with or
without the file extension '.dat', e.g., "Data_LCA.dat"
(default) or "Data_LCA"
.
a character string or character vector indicating whether
to create the six folders specified in the argument
folder
("folder"
), to write the matrix or
data frame specified in x
into a Mplus data file
("data"
), and write the Mplus input files into
the six folders specified in the argument folder
("input"
). By default, the function creates the
folders, writes the Mplus data file, and writes the Mplus
input files into the folders.
a character string indicating the conditional statement to select observations.
a numeric value or character string representing missing
values (NA
) in the Mplus data set. This values
or character string will be specified in the Mplus input
file as MISSING IS ALL(missing)
. By default,
-99
is used to represent missing values.
an integer value specifying the maximum number of classes for the latent class analysis. By default, LCA with a maximum of 6 classes is specified (i.e., k = 1 to k = 6).
a character string for specifying the ESTIMATOR
option in Mplus. By default, the estimator "MLR"
is used.
a vector with two integer values for specifying the
STARTS
option in Mplus. The first number represents
the number of random sets of starting values to generate
in the initial stage and the second number represents the
optimizations to use in the final stage. By default, 500
random sets of starting values are generated and 100
optimizations are carried out in the final stage.
an integer value specifying the STITERATIONS
option
in Mplus. The numeric value represents the maximum number
of iterations allowed in the initial stage. By default,
50 iterations are requested.
an integer value for specifying the LRTBOOTSTRAP
option in Mplus when requesting a parametric bootstrapped
likelihood ratio test (i.e., output = "TECH14"
).
The value represents the number of bootstrap draws to
be used in estimating the p-value of the parametric
bootstrapped likelihood ratio test. By default, 1000
bootstrap draws are requested.
a vector with four integer values for specifying the
LRTSTARTS
option in Mplus when requesting a
parametric bootstrapped likelihood ratio test (i.e.,
output = "TECH14"
). The values specify the number
of starting values to use in the initial stage and the
number of optimizations to use in the final stage for
the k - 1
and k
classes model when the
data generated by bootstrap draws are analyzed. By default,
0 random sets of starting values in the initial stage
and 0 optimizations in the final stage are used for the
k - 1
classes model and 100 random sets of starting
values in the initial stage and 50 optimizations in the
final stage are used for the k
class model.
a vector of one or two integer values for specifying the
PROCESSORS
option in Mplus. The values specifies
the number of processors and threads to be used for
parallel computing to increase computational speed. By
default, 8 processors and threads are used for parallel
computing.
a character string or character vector specifying the
TECH
options in the OUTPUT
section in Mplus,
i.e., SVALUES
to request input statements that
contain parameter estimates from the analysis, CINTERVAL
to request confidence intervals, TECH7
to request
sample statistics for each class using raw data weighted
by the estimated posterior probabilities for each class,
TECH8
to request the optimization history in
estimating the model, TECH11
to request the
Lo-Mendell-Rubin likelihood ratio test of model fit,
and TECH14
to request a parametric bootstrapped
likelihood ratio test. By default, SVALUES
and
TECH11
are requested. Note that TECH11
is only available for the MLR
estimator.
logical: if TRUE
, all existing input files in the
folder specified in the argument folder
are replaced.
logical: if TRUE
, all models in the folders specified
in the argument folder
are estimated by using the
mplus.run
function in the R package misty
.
a character string for specifying the name or path of the Mplus executable to be used for running models. This covers situations where Mplus is not in the system's path, or where one wants to test different versions of the Mplus program. Note that there is no need to specify this argument for most users since it has intelligent defaults.
a character string for specifying three settings, i.e.,
"always"
to run all models regardless of whether
an output file for the model exists, "never"
to not run any model that has an existing output file,
and "modified"
(default) to only runs a model if the
modified date for the input file is more recent than
the output file modified date.
logical: if TRUE
(default), argument specification is checked.
Takuya Yanagida takuya.yanagida@univie.ac.at
Latent class analysis (LCA) is a model-based clustering and classification method used to identify qualitatively different classes of observations which are unknown and must be inferred from the data. LCA can accommodate continuous, count, binary, ordered categorical, and unordered categorical indicators. LCA with continuous indicator variables are also known as latent profile analysis (LPA). In LPA, the within-profile variance-covariance structures represent different assumptions regarding the variance and covariance of the indicator variables both within and between latent profiles. As the best within-profile variance-covariance structure is not known a priori, all of the different structures must be investigated to identify the best model (Masyn, 2013). This function specifies six different variance-covariance structures labeled A to F (see Table 1 in Patterer et al, 2023):
The within-profile variance is constrained to be profile-invariant and covariances are constrained to be 0 in all profiles (i.e., equal variances across profiles and no covariances among indicator variables). This is the default setting in Mplus.
The within-profile variance is profile-varying and covariances are constrained to be 0 in all profiles (i.e., unequal variances across profiles and no covariances among indicator variables).
The within-profile variance is constrained to be profile-invariant and covariances are constrained to be equal in all profiles (i.e., equal variances and covariances across profiles).
The within-profile variance is constrained to be profile-invariant and covariances are profile-varying (i.e., equal variances across profiles and unequal covariances across profiles).
The within-profile variances are profile-varying and covariances are constrained to be equal in all profiles (i.e., unequal variances across profiles and equal covariances across profiles).
The within-class variance and covariances are both profile-varying (i.e., unequal variances and covariances across profiles).
Masyn, K. E. (2013). Latent class analysis and finite mixture modeling. In T. D. Little (Ed.), The Oxford handbook of quantitative methods: Statistical analysis (pp. 551–611). Oxford University Press.
Muthen, L. K., & Muthen, B. O. (1998-2017). Mplus User's Guide (8th ed.). Muthen & Muthen.
Patterer, A. S., Yanagida, T., Kühnel, J., & Korunka, C. (2023). Daily receiving and providing of social support at work: Identifying support exchange patterns in hierarchical data. Journal of Work and Organizational Psychology, 32(4), 489-505. https://doi.org/10.1080/1359432X.2023.2177537
read.mplus
, write.mplus
, mplus
,
mplus.update
, mplus.print
, mplus.plot
,
mplus.bayes
, mplus.run
if (FALSE) {
# Load data set "HolzingerSwineford1939" in the lavaan package
data("HolzingerSwineford1939", package = "lavaan")
#-------------------------------------------------------------------------------
# Example 1: LCA with k = 1 to k = 8 profiles, continuous indicators
# Input statements that contain parameter estimates
# Vuong-Lo-Mendell-Rubin LRT and bootstrapped LRT
mplus.lca(HolzingerSwineford1939, ind = c("x1", "x2", "x3", "x4"),
classes = 8, output = c("SVALUES", "TECH11", "TECH14"))
#-------------------------------------------------------------------------------
# Example 22: LCA with k = 1 to k = 6 profiles, ordered categorical indicators
# Select observations with ageyr <= 13
# Estimate all models in Mplus
mplus.lca(round(HolzingerSwineford1939[, -5]), ind = c("x1", "x2", "x3", "x4"),
type = "categorical", useobservations = "ageyr <= 13",
mplus.run = TRUE)
}
Run the code above in your browser using DataLab