simFA: Generate Factor Analysis Models and Data Sets for Simulation Studies

Description

A function to simulate factor loadings matrices and Monte Carlo data sets for common factor models, bifactor models, and IRT models.

Usage

simFA(
  Model = list(),
  Loadings = list(),
  CrossLoadings = list(),
  Phi = list(),
  ModelError = list(),
  Bifactor = list(),
  MonteCarlo = list(),
  FactorScores = list(),
  Missing = list(),
  Control = list(),
  Seed = NULL
)

Value

loadings A common factor or bifactor loadings matrix.
Phi A factor correlation matrix.
urloadings The unrotated loadings matrix.
h2 A vector of item communalities.
h2PopME A vector item communalities that may include model approximation error.
Rpop The model-implied population correlation matrix.
RpopME The model-implied population correlation matrix with model error.
W The factor loadings for the minor factors (when ModelError = TRUE). Default = NULL.
Xm That part of the observed scores that is due to the minor common factors.
SFSvars Variances of the Specific Factors in the metric of the observed scores.
ModelErrorFitStats A list of model fit indices (for the underlying equations, see: Bentler, 1990; Hu & Bentler, 1999; Marsh, Hau, & Grayson, 2005; Steiger, 2016):
- SRMR_theta Standardized Root Mean Square Residual based on the model that is implied by the error free major factors only (underlying Rpop),
- SRMR_thetahat Standardized Root Mean Square Residual based on an exploratory factor analysis of the population correlation matrix, RpopME,
- CRMR_theta Correlation Root Mean Square Residual based on the model that is implied by the error free major factors only (underlying Rpop),
- CRMR_thetahat Correlation Root Mean Square Residual based on an exploratory factor analysis of the population correlation matrix, RpopME,
- RMSEA_theta Root Mean Square Error of Approximation (Steiger, 2016) based on the model that is implied by the error free major factors only (underlying Rpop),
- RMSEA_thetahat Root Mean Square Error of Approximation (Steiger, 2016) based on an exploratory factor analysis of the population correlation matrix, RpopME,
- CFI_theta Comparative Fit Index (Bentler, 1990) based on the model that is implied by the error free major factors only (underlying Rpop),
- CFI_thetahat Comparative Fit Index (Bentler, 1990) based on an exploratory factor analysis of the population correlation matrix, RpopME.
- Fm MLE fit function for population target model.
- Fb MLE fit function for population baseline model.
- DFm Degrees of freedom for population target model.
CovMatrices A list containing:
- CovMajor The model implied covariances from the major factors.
- CovMinor The model implied covariances from the minor factors.
- CovUnique The model implied variances from the uniqueness factors.
Bifactor A list containing:
- loadingsHier Factor loadings of the 1st order solution of a hierarchical bifactor model.
- PhiHier Factor correlations of the 1st order solution of a hierarchical bifactor model.
Scores A list containing:
- FactorScores Factor scores for the common and uniqueness factors.
- FacInd Factor indeterminacy indices for the error free population model.
- FacIndME Factor score indeterminacy indices for the population model with model error.
- ObservedScores A matrix of model implied ObservedScores. If Thresholds were supplied under Keyword FactorScores, ObservedScores will be transformed into Likert scores.
Monte A list containing output from the Monte Carlo simulations if generated.
IRT Factor loadings expressed in the normal ogive IRT metric. If Thresholds were given then IRT difficulty values will also be returned.
Seed The initial seed for the random number generator.
call A copy of the function call.
cn A list of all active and nonactive function arguments.

Arguments

Model

(list)

NFac (scalar) Number of common or group factors; defaults to NFac = 3.
NItemPerFac
- (scalar) All factors have the same number of primary loadings.
- (vector) A vector of length NFac specifying the number of primary loadings for each factor; defaults to NItemPerFac = 3.
Model (character) "orthogonal" or "oblique"; defaults to Model = "orthogonal".

Loadings

(list)

FacPattern (NULL or matrix).
- FacPattern = M where M is a user-defined factor pattern matrix.
- FacPattern = NULL; simFA will generate a factor pattern based on the arguments specified under other keywords (e.g., Model, CrossLoadings, etc.); defaults to FacPattern = NULL.
FacLoadDist (character) Specifies the sampling distribution for the common factor loadings. Possible values are "runif", "rnorm", "sequential", and "fixed"; defaults to FacLoadDist = "runif".
FacLoadRange (vector of length NFac, 2, or 1); defaults to FacLoadRange = c(.3, .7).
- If FacLoadDist = "runif" the vector defines the bounds of the uniform distribution;
- If FacLoadDist = "rnorm" the vector defines the mean and standard deviation of the normal distribution from which loadings are sampled.
- If FacLoadDist = "sequential" the vector specifies the lower and upper bound of the loadings sequence.
- If FacLoadDist = "fixed" and FacLoadRange is a vector of length 1 then all common loadings will equal the constant specified in FacLoadRange. If FacLoadDist = "fixed" and FacLoadRange is a vector of length NFac then each factor will have fixed loadings as specified by the associated element in FacLoadRange.
h2 (vector) An optional vector of communalities used to constrain the population communalities to user-defined values; defaults to h2 = NULL.

CrossLoadings

(list)

ProbCrossLoad (scalar) A value in the (0,1) interval that determines the probability that a cross loading will be present in elements of the loadings matrix that do not have salient (primary) factor loadings. If set to ProbCrossLoad = 1, a single cross loading will be added to each factor; defaults to ProbCrossLoad = 0.
CrossLoadRange (vector of length 2) Controls size of the cross loadings; defaults to CrossLoadRange = c(.20, .25).
CrossLoadPositions (matrix) Specifies the row and column positions of (optional) cross loadings; defaults to CrossLoadPositions = NULL.
CrossLoadValues (vector) If CrossLoadPositions is specified then CrossLoadValues is a vector of user-supplied cross-loadings; defaults to CrossLoadValues = NULL.
CrudFactor (scalar) Controls the size of tertiary factor loadings. If CrudFactor != 0 then elements of the loadings matrix with neither primary nor secondary (i.e., cross) loadings will be sampled from a \[-(CrudFactor), (CrudFactor)\] uniform distribution; defaults to CrudFactor = 0.

Phi

(list)

MaxAbsPhi (scalar) Upper (absolute) bound on factor correlations; defaults to MaxAbsPhi = .5.
EigenValPower (scalar) Controls the skewness of the eigenvalues of Phi. Larger values of EigenValPower result in a Phi spectrum that is more right-skewed (and thus closer to a unidimensional model); defaults to EigenValPower = 2.
PhiType (character); defaults to PhiType = "free".
- If PhiType = "free" factor correlations will be randomly generated under the constraints of MaxAbsPhi and EigenValPower.
- If PhiType = "fixed" all factor correlations will equal the value specified in MaxAbsPhi. A fatal error will be produced if Phi is not positive semidefinite.
- If PhiType = "user" the factor correlations are defined by the matrix specified in UserPhi (see below).
UserPhi (matrix) A positive semidefinite (PSD) matrix of user-defined factor correlations; defaults to UserPhi = NULL.

ModelError

(list)

ModelError (logical) If ModelError = TRUE model error will be introduced into the factor pattern via the method described by Tucker, Koopman, and Linn (TKL, 1969); defaults to ModelError = FALSE.
W (matrix) An optional user-supplied factor loading matrix for the NMinorFac minor common factors; defaults to W = NULL.
NMinorFac (scalar) Number of minor factors in the TKL model; defaults to NMinorFac = 150.
ModelErrorType (character) If ModelErrorType = "U" then ModelErrorVar is the proportion of uniqueness variance that is due to model error. If ModelErrorType = "V" then ModelErrorVar is the proportion of total variance that is due to model error; defaults to ModelErrorType = "U".
ModelErrorVar (scalar \[0,1\]) The proportion of uniqueness (U) or total (V) variance that is due to model error; defaults to ModelErrorVar = .10.
epsTKL (scalar \[0,1\]) Controls the size of the factor loadings in successive minor factors; defaults to epsTKL = .20.
Wattempts (scalar > 0) Maximum number of tries when attempting to generate a suitable W matrix. Default = 10000.
WmaxLoading (scalar > 0) Threshold value for NWmaxLoading. Default WmaxLoading = .30.
NWmaxLoading (scalar >= 0) Maximum number of absolute loadings >= WmaxLoading in any column of W (matrix of model approximation error factor loadings). Default NWmaxLoading = 2. Under the defaults, no column of W will have 3 or more loadings > |.30|.
PrintW (Boolean) If PrintW = TRUE then simFA will print the attempt history when searching for a suitable W matrix given the constraints defined in WmaxLoading and NWmaxLoading. Default PrintW = FALSE.
RSpecific (matrix) Optional correlation matrix for specific factors; defaults to RSpecific = NULL.

Bifactor

(list)

Bifactor (logical) If Bifactor = TRUE parameters for the bifactor model will be generated; defaults to Bifactor = FALSE.
Hierarchical (logical) If Hierarchical = TRUE then a hierarchical Schmid Leiman (1957) bifactor model will be generated; defaults to Hierarchical = FALSE.
F1FactorDist (character) Specifies the sampling distribution for the general factor loadings. Possible values are "runif", "rnorm", "sequential", and "fixed"; defaults to F1FactorDist = "sequential".
F1FactorRange (vector of length 1 or 2) Controls the sizes of the general factor loadings in non-hierarchical bifactor models; defaults to F1FactorRange = c(.4, .7).
- If F1FactorDist = "runif", the vector of length 2 defines the bounds of the uniform distribution, c(lower, upper);
- If F1FactorDist = "rnorm", the vector defines the mean and standard deviation of the normal distribution from which loadings are sampled, c(MN, SD).
- If F1FactorDist = "sequential", the vector specifies the lower and upper bound of the loadings sequence, c(lower, upper).

MonteCarlo

(list)

NSamples (integer) Defines number of Monte Carlo Samples; defaults to NSamples = 0.
SampleSize (integer) Sample size for each Monte Carlo sample; defaults to SampleSize = 250.
Raw (logical) If Raw = TRUE, simulated data sets will contain raw data. If Raw = FALSE, simulated data sets will contain correlation matrices; defaults to Raw = FALSE.
Thresholds (list) List elements contain thresholds for each item. Thresholds are required when generating Likert variables.

FactorScores

(list)

FS (logical) If FS = TRUE (true) factor scores will be simulated; defaults to FS = FALSE.
CFSeed (integer) Optional starting seed for the common factor scores; defaults to CFSeed = NULL in which case a random seed is used.
MCFSeed (integer) Optional starting seed for the minor common factor scores; defaults to MCFSeed = NULL.
SFSeed (integer) Optional starting seed for the specific factor scores; defaults to SFSeed = NULL in which case a random seed is used.
EFSeed (integer) Optional starting seed for the error factor scores; defaults to EFSeed = NULL in which case a random seed is used. Note that CFSeed, MCFSeed, SFSeed, and EFSeed must be different numbers (a fatal error is produced when two or more seeds are specified as equal).
VarRel (vector) A vector of manifest variable reliabilities. The specific factor variance for variable i will equal \(VarRel[i] - h^2[i]\) (the manifest variable reliability minus its commonality). By default, \(VarRel = h^2\) (resulting in uniformly zero specific factor variances).
Population (logical) If Population = TRUE, factor scores will fit the correlational constraints of the factor model exactly (e.g., the common factors will be orthogonal to the unique factors); defaults to Population = FALSE.
NFacScores (scalar) Sample size for the factor scores; defaults to NFacScores = 250.
Thresholds (list) A list of quantiles used to polychotomize the observed data that will be generated from the factor scores.

Missing

(list)

Missing (logical) If Missing = TRUE all data sets will contain missing values; defaults to Missing = FALSE.
Mechanism (character) Specifies the missing data mechanism. Currently, the program only supports missing completely at random (MCAR): Missing = "MCAR".
MSProb (scalar or vector of length NVar) Specifies the probability of missingness for each variable; defaults to MSprob = 0.

Control

(list)

IRT (logical) If IRT = TRUE then user-supplied thresholds will be interpreted as item intercepts; defaults to IRT = FALSE.
Dparam (scalar). If Dparam = 1 then item intercepts should be scaled in the logistic metric. If Dparam = 1.702 then intercepts should be scaled in the probit metric.
Maxh2 (scalar) Rows of the loadings matrix will be rescaled to have a maximum communality of Maxh2; defaults to Maxh2 = .98.
Reflect (logical) If Reflect = TRUE loadings on the common factors will be randomly reflected; defaults to Reflect = FALSE.

Seed

(integer) Starting seed for the random number generator; defaults to Seed = NULL. When no seed is specified by the user, the program will generate a random seed.

Author

Niels G. Waller with contributions by Hoang V. Nguyen

Details

For a complete description of simFA's capabilities, users are encouraged to consult the simFABook at http://users.cla.umn.edu/~nwaller/simFA/simFABook.pdf.

simFA is a program for exploring factor analysis models via simulation studies. After calling simFA all relevant output can be saved for further processing by calling one or more of the following object names.

References

Bentler, P. M. (1990). Comparative fit indexes in structural models. Psychological Bulletin, 107(2), 238--246.

Hu, L.-T. & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling: A Multidisciplinary Journal, 6(1), 1--55.

Marsh, H. W., Hau, K.-T., & Grayson, D. (2005). Goodness of fit in structural equation models. In A. Maydeu-Olivares & J. J. McArdle (Eds.), Multivariate applications book series. Contemporary psychometrics: A festschrift for Roderick P. McDonald (p. 275--340). Lawrence Erlbaum Associates Publishers.

Schmid, J. and Leiman, J. M. (1957). The development of hierarchical factor solutions. Psychometrika, 22(1), 53--61.

Steiger, J. H. (2016). Notes on the Steiger–Lind (1980) handout. Structural Equation Modeling: A Multidisciplinary Journal, 23:6, 777-781.

Tucker, L. R., Koopman, R. F., and Linn, R. L. (1969). Evaluation of factor analytic research procedures by means of simulated correlation matrices. Psychometrika, 34(4), 421--459.

Examples

Run this code


## Not run:
#  Ex 1. Three Factor Simple Structure Model with Cross loadings and
#  Ideal Non salient Loadings
   out <-  simFA(Seed = 1)
   print( round( out$loadings, 2 ) )

# Ex 2. Non Hierarchical bifactor model 3 group factors
# with constant loadings on the general factor
   out <- simFA(Bifactor = list(Bifactor = TRUE,
                                Hierarchical = FALSE,
                                F1FactorRange = c(.4, .4),
                                F1FactorDist = "runif"),
                Seed = 1)
   print( round( out$loadings, 2 ) )

   # Ex 3.  Model Fit Statistics for Population Data with
   # Model Approximation Error. Three Factor model.
       out <- simFA(Loadings = list(FacLoadDist = "fixed",
                                    FacLoadRange = .5),
                    ModelError = list(ModelError = TRUE,
                                      NMinorFac = 150,
                                      ModelErrorType = "V",
                                      ModelErrorVar = .1,
                                      Wattempts = 10000,
                                      epsTKL = .2),
                    Seed = 1)

       print( out$loadings )
       print( out$ModelErrorFitStats[seq(2,8,2)] )

## End(**Not run**)

Run the code above in your browser using DataLab