Simulate data starting from a lavaan model syntax.
simulateData(model = NULL, model.type = "sem", meanstructure = FALSE, 
    int.ov.free = TRUE, int.lv.free = FALSE, 
    marker.int.zero = FALSE, conditional.x = FALSE, 
    composites = TRUE, fixed.x = FALSE,
    orthogonal = FALSE, std.lv = TRUE, auto.fix.first = FALSE, 
    auto.fix.single = FALSE, auto.var = TRUE, auto.cov.lv.x = TRUE, 
    auto.cov.y = TRUE, ..., sample.nobs = 500L, ov.var = NULL, 
    group.label = paste("G", 1:ngroups, sep = ""), skewness = NULL, 
    kurtosis = NULL, seed = NULL, empirical = FALSE, 
    return.type = "data.frame", return.fit = FALSE,
    debug = FALSE, standardized = FALSE)The generated data. Either as a data.frame 
(if return.type="data.frame"), 
a numeric matrix (if return.type="matrix"),
or a covariance matrix (if return.type="cov").
A description of the user-specified model. Typically, the model
    is described using the lavaan model syntax. See
    model.syntax for more information. Alternatively, a
    parameter table (eg. the output of the lavaanify() function) is also
    accepted.
Set the model type: possible values
    are "cfa", "sem" or "growth". This may affect
    how starting values are computed, and may be used to alter the terminology
    used in the summary output, or the layout of path diagrams that are
    based on a fitted lavaan object.
If TRUE, the means of the observed
    variables enter the model. If "default", the value is set based
    on the user-specified model, and/or the values of other arguments.
If FALSE, the intercepts of the observed variables
    are fixed to zero.
If FALSE, the intercepts of the latent variables
    are fixed to zero.
Logical. Only relevant if the metric of each latent
    variable is set by fixing the first factor loading to unity.
    If TRUE, it implies meanstructure = TRUE and 
    std.lv = FALSE, and it fixes the intercepts of the marker
    indicators to zero, while freeing the means/intercepts of the latent
    variables. Only works correcly for single group, single level models.
If TRUE, we set up the model conditional on
    the exogenous `x' covariates; the model-implied sample statistics
    only include the non-x variables. If FALSE, the exogenous `x'
    variables are modeled jointly with the other variables, and the
    model-implied statistics refect both sets of variables. If
    "default", the value is set depending on the estimator, and
    whether or not the model involves categorical endogenous variables.
If TRUE, allow for the new (0.6-20) approach
    to handle composites.
If TRUE, the exogenous `x' covariates are considered
    fixed variables and the means, variances and covariances of these variables
    are fixed to their sample values. If FALSE, they are considered
    random, and the means, variances and covariances are free parameters. If
    "default", the value is set depending on the mimic option.
If TRUE, the exogenous latent variables
    are assumed to be uncorrelated.
If TRUE, the metric of each latent variable is
    determined by fixing their variances to 1.0. If FALSE, the metric
    of each latent variable is determined by fixing the factor loading of the
    first indicator to 1.0.
If TRUE, the factor loading of the first indicator
    is set to 1.0 for every latent variable.
If TRUE, the residual variance (if included)
    of an observed indicator is set to zero if it is the only indicator of a
    latent variable.
If TRUE, the (residual) variances of both observed
    and latent variables are set free.
If TRUE, the covariances of exogenous latent
    variables are included in the model and set free.
If TRUE, the covariances of dependent variables
    (both observed and latent) are included in the model and set free.
additional arguments passed to the lavaan
    function.
Number of observations. If a vector, multiple datasets
    are created. If return.type = "matrix" or 
    return.type = "cov", a list of length(sample.nobs) 
    is returned, with either the data or covariance matrices, each one
    based on the number of observations as specified in sample.nobs.
    If return.type = "data.frame", all datasets are merged and 
    a group variable is added to mimic a multiple group dataset.
The user-specified variances of the observed variables.
The group labels that should be used if multiple groups are created.
Numeric vector. The skewness values for the observed variables. Defaults to zero.
Numeric vector. The kurtosis values for the observed variables. Defaults to zero.
Set random seed.
Logical. If TRUE, the implied moments (Mu and Sigma)
    specify the empirical not population mean and covariance matrix.
If "data.frame", a data.frame is returned. If
    "matrix", a numeric matrix is returned (without any variable names).
    If "cov", a covariance matrix is returned (without any variable 
    names).
If TRUE, return the fitted model that has been used
    to generate the data as an attribute (called "fit"); this 
    may be useful for inspection.
If TRUE, debugging information is displayed.
If TRUE, the residual variances of the observed
    variables are set in such a way such that the model implied variances 
    are unity. This allows regression coefficients and factor loadings 
    (involving observed variables) to be specified in a standardized metric.
Model parameters can be specified by fixed values in the lavaan model syntax. If no fixed values are specified, the value zero will be assumed, except for factor loadings and variances, which are set to 0.7 and 1.0 respectively. By default, multivariate normal data are generated. However, by providing skewness and/or kurtosis values, nonnormal multivariate data can be generated, using the Vale & Maurelli (1983) method.
# specify population model
population.model <- ' f1 =~ x1 + 0.8*x2 + 1.2*x3
                      f2 =~ x4 + 0.5*x5 + 1.5*x6
                      f3 =~ x7 + 0.1*x8 + 0.9*x9
                      f3 ~ 0.5*f1 + 0.6*f2
                    '
# generate data
set.seed(1234)
myData <- simulateData(population.model, sample.nobs=100L)
# population moments
fitted(sem(population.model))
# sample moments
round(cov(myData), 3)
round(colMeans(myData), 3)
# fit model
myModel <- ' f1 =~ x1 + x2 + x3
             f2 =~ x4 + x5 + x6
             f3 =~ x7 + x8 + x9
             f3 ~ f1 + f2 '
fit <- sem(myModel, data=myData)
summary(fit)
Run the code above in your browser using DataLab