A function to simulate factor loadings matrices and Monte Carlo data sets
for common factor models and bifactor models.
For a complete description of simFA
's
capabilities, users are encouraged to consult the simFABook
at
http://users.cla.umn.edu/~nwaller/simFA/simFABook.pdf.
simFA(
Model = list(),
Loadings = list(),
CrossLoadings = list(),
Phi = list(),
ModelError = list(),
Bifactor = list(),
MonteCarlo = list(),
FactorScores = list(),
Missing = list(),
Control = list(),
Seed = NULL
)
loadings
A common factor or bifactor loadings
matrix.
Phi
A factor correlation matrix.
urloadings
The unrotated loadings matrix.
h2
A vector of item commonalities.
h2PopME
A vector item commonalities that may include
model approximation error.
Rpop
The model-implied population correlation matrix.
RpopME
The model-implied population
correlation matrix with model error.
W
The factor loadings for the minor factors
(when ModelError = TRUE
). Default = NULL.
ModelErrorFitStats
A list of model fit indices (for the underlying equations,
see: Benlter, 1990; Hu & Bentler, 1999; Marsh, Hau, & Grayson, 2005; Steiger, 2016):
SRMR_theta
Standardized Root Mean Square Residual
based on the model that is implied by the error free major factors only (underlying Rpop),
SRMR_thetahat
Standardized Root Mean Square Residual
based on an exploratory factor analysis of the population correlation matrix, RpopME,
CRMR_theta
Correlation Root Mean Square Residual
based on the model that is implied by the error free major factors only (underlying Rpop),
CRMR_thetahat
Correlation Root Mean Square Residual based on an exploratory
factor analysis of the population correlation matrix, RpopME,
RMSEA_theta
Root Mean Square Error of Approximation (Steiger, 2016)
based on the model that is implied by the error free major factors only (underlying Rpop),
RMSEA_thetahat
Root Mean Square Error of Approximation (Steiger, 2016) based on an exploratory
factor analysis of the population correlation matrix, RpopME,
CFI_theta
Comparative Fit Indiex (Bentler, 1990)
based on the model that is implied by the error free major factors only (underlying Rpop),
CFI_thetahat
Comparative Fit Indiex (Bentler, 1990) based on an exploratory
factor analysis of the population correlation matrix, RpopME.
Fm
MLE fit function for population target model.
Fb
MLE fit function for population baseline model.
DFm
Degrees of freedom for population target model.
CovMatrices
A list
containing:
CovMajor
The model implied covariances
from the major factors.
CovMinor
The model implied covariances
from the minor factors.
CovUnique
The model implied variances
from the uniqueness factors.
Bifactor
A list containing:
loadingsHier
Factor loadings of the 1st
order solution of a hierarchical bifactor model.
PhiHier
Factor correlations of
the 1st order solution of a hierarchical bifactor model.
Scores
A list containing:
FactorScores
Factor scores for the common
and uniqueness factors.
FacInd
Factor
indeterminacy indices for the error free population model.
FacIndME
Factor score indeterminacy indices for
the population model with model error.
ObservedScores
A matrix of model implied
ObservedScores
. If
Thresholds
were supplied under Keyword
FactorScores
,
ObservedScores
will be transformed into Likert
scores.
Monte
A list containing output from the Monte Carlo
simulations if generated.
IRT
Factor loadings expressed in the
normal ogive IRT metric. If Thresholds
were given
then IRT difficulty values will also be returned.
Seed
The initial seed for the
random number generator.
call
A copy of the function call.
cn
A list of all active and nonactive function arguments.
(list)
NFac
(scalar) Number of common or
group factors; defaults to NFac = 3
.
NItemPerFac
(scalar) All factors have the same number of primary loadings.
(vector) A vector of length NFac
specifying the
number of primary loadings for each factor; defaults
to NItemPerFac = 3
.
Model
(character) "orthogonal"
or
"oblique"
; defaults to
Model = "orthogonal"
.
(list)
FacPattern
(NULL
or matrix).
FacPattern = M
where M
is
a user-defined factor pattern matrix.
FacPattern = NULL
;
simFA
will generate a factor pattern based on
the arguments specified under other keywords
(e.g., Model
, CrossLoadings
, etc.);
defaults to FacPattern = NULL
.
FacLoadDist
(character) Specifies the
sampling distribution for the common factor
loadings. Possible values are "runif"
,
"rnorm"
, "sequential"
, and "fixed"
;
defaults to FacLoadDist = "runif"
.
FacLoadRange
(vector of length NFac
,
2, or 1); defaults to FacLoadRange = c(.3, .7)
.
If FacLoadDist = "runif"
the vector
defines the bounds of the uniform distribution;
If FacLoadDist = "rnorm"
the vector defines the
mean and standard deviation of the normal distribution from
which loadings are sampled.
If FacLoadDist = "sequential"
the vector
specifies the lower and upper bound of the loadings sequence.
If FacLoadDist = "fixed"
and
FacLoadRange
is a vector of length 1
then all common loadings will equal the constant
specified in FacLoadRange
.
If FacLoadDist = "fixed"
and
FacLoadRange
is a vector of length
NFac
then each factor will have fixed loadings
as specified by the associated element in
FacLoadRange
.
h2
(vector) An optional vector of communalities
used to constrain the population communalities to user-defined
values; defaults to h2 = NULL
.
(list)
ProbCrossLoad
(scalar) A value in the (0,1) interval
that determines the probability that a cross loading will be
present in elements of the loadings matrix that do not have
salient (primary) factor loadings.
If set to ProbCrossLoad = 1
, a
single cross loading will be added to each factor; defaults to
ProbCrossLoad = 0
.
CrossLoadRange
(vector of length 2) Controls
size of the crossloadings; defaults to
CrossLoadRange= c(.20, .25)
.
CrossLoadPositions
(matrix) Specifies the row and column
positions of (optional) cross-loadings; defaults to
CrossLoadPositions = NULL
.
CrossLoadValues
(vector) If
CrossLoadPositions
is specified then
CrossLoadValues
is a vector of user-supplied
cross-loadings; defaults to
CrossLoadValues = NULL
.
CrudFactor
(scalar) Controls the size of tertiary
factor loadings. If CrudFactor != 0
then elements of
the loadings matrix with neither primary nor secondary
(i.e., cross) loadings will be sampled from a
[-(CrudFactor), (CrudFactor)] uniform distribution;
defaults to CrudFactor = 0
.
(list)
MaxAbsPhi
(scalar) Upper (absolute) bound on factor
correlations; defaults to MaxAbsPhi = .5
.
EigenValPower
(scalar) Controls the skewness of the
eigenvalues of Phi. Larger values of EigenValPower
result
in a Phi spectrum that is more right-skewed (and
thus closer to a unidimensional model);
defaults to EigenValPower = 2
.
PhiType
(character); defaults to
PhiType = "free"
.
If PhiType = "free"
factor correlations
will be randomly generated under the constraints of
MaxAbsPhi
and EigenValPower
.
If PhiType = "fixed"
all factor
correlations will equal the value specified in
MaxAbsPhi
. A fatal error will be produced if
Phi
is not positive semidefinite.
If PhiType = "user"
the factor correlations are
defined by the matrix specified in UserPhi
(see below).
UserPhi
(matrix) A positive semidefinite (PSD) matrix
of user-defined factor correlations;defaults to UserPhi = NULL
.
(list)
ModelError
(logical) If
ModelError = TRUE
model error will be introduced into
the factor pattern via the method described by Tucker, Koopman,
and Linn (TKL, 1969); defaults to ModelError = FALSE
.
NMinorFac
(scalar) Number of minor factors in the TKL model;
defaults to NMinorFac = 150
.
ModelErrorType
(character) If ModelErrorType =
"U"
then ModelErrorVar
is the proportion of uniqueness
variance that is due to model error.
If ModelErrorType = "V"
then
ModelErrorVar
is the proportion of total variance that
is due to model error; defaults to ModelErrorType = "U"
.
ModelErrorVar
(scalar [0,1]) The proportion of
uniqueness (U) or total (V) variance that is due to model error;
defaults to ModelErrorVar = .10
.
epsTKL
(scalar [0,1]) Controls the size of the factor
loadings in successive minor factors; defaults to epsTKL = .20
.
Wattempts
(scalar > 0) Maximum number of tries when
attempting to generate a suitable W matrix. Default = 10000.
WmaxLoading
(scalar > 0) Threshold value for NWmaxLoading
.
Default WmaxLoading = .30
.
NWmaxLoading
(scalar >= 0) Maximum number of absolute
loadings >= WmaxLoading
in any column
of W (matrix of model approximation error factor loadings).
Default NWmaxLoading = 2
. Under the defaults, no column of
W will have 3 or more loadings > |.30|.
PrintW
(Boolean) If PrintW = TRUE
then simFA will print the
attempt history when searching for a suitable W matrix
given the constraints defined in WmaxLoading
and NWmaxLoading
.
RSpecific
(matrix) Optional correlation
matrix for specific factors; defaults to RSpecific = NULL
.
(list)
Bifactor (logical) If Bifactor = TRUE
parameters for
the bifactor model will be generated; defaults to
Bifactor = FALSE
.
Hierarchical (logical) If Hierarchical
= TRUE
then a hierarchical Schmid Leiman (1957) bifactor model
will be generated; defaults to Hierarchical = FALSE
.
F1FactorDist
(character) Specifies the sampling distribution
for the general factor loadings. Possible values are
"runif"
,
"rnorm"
,
"sequential"
, and
"fixed"
; defaults to
F1FactorDist = "sequential"
.
F1FactorRange
(vector of length 1 or 2) Controls the
sizes of the general factor loadings in
nonhierarchical bifactor models; defaults to
F1FactorRange = c(.4, .7)
.
If F1FactorDist = "runif"
, the vector of
length 2 defines the bounds of the uniform distribution,
c(lower, upper);
If F1FactorDist = "rnorm"
, the vector defines
the mean and standard deviation of the normal distribution
from which loadings are sampled, c(MN, SD).
If F1FactorDist = "sequential"
, the vector
specifies the lower and upper bound of the loadings
sequence, c(lower, upper).
(list)
NSamples
(integer) Defines
number of Monte Carlo Samples; defaults to NSamples = 0
.
SampleSize
(integer) Sample size for each Monte
Carlo sample; defaults to SampleSize = 250
.
Raw
(logical) If Raw = TRUE
, simulated data sets
will contain raw data. If Raw = FALSE
, simulated data sets
will contain correlation matrices; defaults to
Raw = FALSE
.
Thresholds
(list) List elements contain
thresholds for each item. Thresholds are required when generating
Likert variables.
(list)
FS
(logical) If FS = TRUE
(true) factor scores
will be simulated; defaults to FS = FALSE
.
CFSeed
(integer) Optional starting seed for the common
factor scores; defaults to CFSeed = NULL
in which case a
random seed is used.
SFSeed
(integer) Optional starting seed for the
specific factor scores; defaults to SFSeed = NULL
in
which case a random seed is used.
EFSeed
(integer) Optional starting seed for the error
factor scores; defaults to
EFSeed = NULL
in which case a
random seed is used. Note that
CFSeed
,
SFSeed
, and
EFSeed
must be
different numbers (a fatal error is produced when two or more
seeds are specified as equal).
VarRel
(vector) A vector of manifest
variable reliabilities. The specific factor variance for
variable i will equal \(VarRel[i] - h^2[i]\)
(the manifest variable reliability minus
its commonality). By default, \(VarRel = h^2\) (resulting
in uniformly zero specific factor variances).
Population
(logical) If Population = TRUE
,
factor scores will fit the correlational
constraints of the factor model exactly (e.g., the common factors
will be orthogonal to the unique factors); defaults to
Population = FALSE
.
NFacScores
(scalar) Sample size for the factor scores;
defaults to NFacScores = 250
.
Thresholds
(list) A list of quantiles used to polychotomize
the observed data that will be generated from the factor scores.
(list)
Missing (logical) If Missing = TRUE
all data sets will
contain missing values; defaults to Missing = FALSE
.
Mechanism
(character) Specifies the missing data
mechanism. Currently, the program only supports missing completely at
random (MCAR): Missing = "MCAR"
.
MSProb
(scalar or vector of length NVar
) Specifies
the probability of missingness for each variable; defaults to
MSprob = 0
.
(list)
Maxh2
(scalar) Rows of the
loadings matrix will be rescaled to have a maximum communality of
Maxh2
; defaults to Maxh2 = .98
.
itemReflect
(logical) If Reflect = TRUE
loadings on
the common factors will be randomly reflected; defaults to
Reflect = FALSE
.
(integer) Starting seed for the random number generator;
defaults to Seed = NULL
. When no seed is specified by the
user, the program will generate a random seed.
Niels G. Waller
simFA
is a program for exploring factor analysis
models via simulation studies.
After calling simFA
all relevant output can be saved
for further processing by calling one or more of the following
object names.
Bentler, P. M. (1990). Comparative fit indexes in structural models. Psychological Bulletin, 107(2), 238--246.
Hu, L.-T. & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling: A Multidisciplinary Journal, 6(1), 1--55.
Marsh, H. W., Hau, K.-T., & Grayson, D. (2005). Goodness of fit in structural equation models. In A. Maydeu-Olivares & J. J. McArdle (Eds.), Multivariate applications book series. Contemporary psychometrics: A festschrift for Roderick P. McDonald (p. 275--340). Lawrence Erlbaum Associates Publishers.
Schmid, J. and Leiman, J. M. (1957). The development of hierarchical factor solutions. Psychometrika, 22(1), 53--61.
Steiger, J. H. (2016). Notes on the Steiger–Lind (1980) handout. Structural Equation Modeling: A Multidisciplinary Journal, 23:6, 777-781.
Tucker, L. R., Koopman, R. F., and Linn, R. L. (1969). Evaluation of factor analytic research procedures by means of simulated correlation matrices. Psychometrika, 34(4), 421--459.
# Ex 1. Three Factor Simple Structure Model with Crossloadings and
# Ideal Nonsalient Loadings
out <- simFA(Seed = 1)
print( round( out$loadings, 2 ) )
# Ex 2. Non Hierarchical bifactor model 3 group factors
# with constant loadings on the general factor
out <- simFA(Bifactor = list(Bifactor = TRUE,
Hierarchical = FALSE,
F1FactorRange = c(.4, .4),
F1FactorDist = "runif"),
Seed = 1)
print( round( out$loadings, 2 ) )
# Ex 3. Model Fit Statistics for Population Data with
# Model Approximation Error. Three Factor model.
out <- simFA(Loadings = list(FacLoadDist = "fixed",
FacLoadRange = .5),
ModelError = list(ModelError = TRUE,
NMinorFac = 150,
ModelErrorType = "V",
ModelErrorVar = .1,
Wattempts = 10000,
epsTKL = .2),
Seed = 1)
print( out$loadings )
print( out$ModelErrorFitStats[seq(2,8,2)] )
Run the code above in your browser using DataLab