simulateSEM: Simulate Data from Structural Equation Model

Description

Interprets the input graph as a structural equation model, generates random path coefficients, and simulates data from the model. This is a very bare-bones function and probably not very useful except for quick validation purposes (e.g. checking that an implied vanishing tetrad truly vanishes in simulated data). For more elaborate simulation studies, please use the lavaan package or similar facilities in other packages.

Usage

simulateSEM(
  x,
  b.default = NULL,
  b.lower = -0.6,
  b.upper = 0.6,
  eps = 1,
  N = 500,
  standardized = TRUE,
  empirical = FALSE,
  verbose = FALSE
)

Value

Returns a data frame containing N values for each variable in x.

Arguments

x: the input graph, a DAG (which may contain bidirected edges).
b.default: default path coefficient applied to arrows for which no coefficient is defined in the model syntax.
b.lower: lower bound for random path coefficients, applied if b.default=NULL.
b.upper: upper bound for path coefficients.
eps: residual variance (only meaningful if standardized=FALSE).
N: number of samples to generate.
standardized: logical. If true, a standardized population covariance matrix is generated (all variables have variance 1).
empirical: logical. If true, the empirical covariance matrix will be equal to the population covariance matrix.
verbose: logical. If true, prints the generated population covariance matrix.

Details

Data are generated in the following manner. Each directed arrow is assigned a path coefficient that can be given using the attribute "beta" in the model syntax (see the examples). All coefficients not set in this manner are set to the b.default argument, or if that is not given, are chosen uniformly at random from the interval given by b.lower and b.upper (inclusive; set both parameters to the same value for constant path coefficients). Each bidirected arrow a <-> b is replaced by a substructure a <- L -> b, where L is an exogenous latent variable. Path coefficients on such substructures are set to sqrt(x), where x is again chosen at random from the given interval; if x is negative, one path coefficient is set to -sqrt(x) and the other to sqrt(x). All residual variances are set to eps.

If standardized=TRUE, all path coefficients are interpreted as standardized coefficients. But not all standardized coefficients are compatible with all graph structures. For instance, the graph structure z <- x -> y -> z is incompatible with standardized coefficients of 0.9, since this would imply that the variance of z must be larger than 1. For large graphs with many parallel paths, it can be very difficult to find coefficients that work.

Examples

Run this code

## Simulate data with pre-defined path coefficients of -.6
g <- dagitty('dag{z -> x [beta=-.6] x <- y [beta=-.6] }')
x <- simulateSEM( g ) 
cov(x)

Run the code above in your browser using DataLab