Estimate structural equation models using model-implied instrumental variables (MIIVs).
miive(
model = model,
data = NULL,
instruments = NULL,
sample.cov = NULL,
sample.mean = NULL,
sample.nobs = NULL,
sample.cov.rescale = TRUE,
estimator = "2SLS",
se = "standard",
bootstrap = 1000L,
boot.ci = "norm",
missing = "listwise",
est.only = FALSE,
var.cov = FALSE,
miiv.check = TRUE,
ordered = NULL,
sarg.adjust = "none",
overid.degree = NULL,
overid.method = "stepwise.R2"
)
A data frame, list or environment or an object coercible
by as.data.frame
to data frame. The most common
application is to supply a data.frame.
This allows user to specify the instruments for
each equation. See Details and the miivs.out
argument of
summary.miivs
for more information on the correct
input format. External (auxiliary) instruments can be
supplied, however, the miiv.check
argument must
be set to FALSE
. In the typical application, the program
will choose the MIIVs for each equation based on the model
specification. To view the model implied instruments after
estimation see the eq.info
argument of
summary.miive
.
Numeric matrix. A sample variance-covariance matrix. The rownames and colnames attributes must contain all the observed variable names indicated in the model syntax.
A sample mean vector. If sample.cov
is provided
and the sample.mean
argument is NULL
, intercepts
for all endogenous variables will not be estimated.
Number of observations in the full data frame.
If TRUE
, the sample covariance matrix
provided by the user is internally rescaled by multiplying
it with a factor (N-1)/N.
Options "2SLS"
or "GMM"
for estimating the
model parameters. Default is "2SLS"
. Currently, only
2SLS
is supported.
If "standard", asymptotic standard errors are
computed. If "bootstrap"
(or "boot"
), bootstrap
standard errors are computed.
Number of bootstrap draws, if bootstrapping is used. The
default is 1000
Method for calculating bootstrap confidence intervals.
Options are normal approximation ("norm"
), basic bootstrap
interval ("basic"
), percentile interval ("perc"
),
and adjusted bootstrap percentile ("bca"
). The default is
normal approximation. See boot.ci
for more
information.
Default is "listwise"
however, a maximum likelihood
related missing data method called "twostage"
is
also available. See Details below on missing
for more
information.
If TRUE
, only the coefficients are returned.
If TRUE
, variance and covariance parameters are
estimated.
Default is TRUE
. miiv.check
provides a
check to determine whether user-upplied instruments are implied
by the model specification (e.g. valid MIIVs). When auxiliary or
external instruments are provided miiv.check
should be
set to FALSE
.
A vector of variable names to be treated as ordered factors
in generating the polychoric correlation matrix and subsequent PIV
estimates. See details on ordered
below for more information.
Adjusment methods used to adjust the p-values associated
with the Sargan test due to multiple comparisons. Defaults is
none
. For options see p.adjust
.
A numeric value indicating the degree of overidentification to be used in estimation.
The method by which excess MIIVs should
be pruned to satisfy the overid.degree
. Options include
random (minimum.eigen
) or stepwise R2.
(stepwise.R2
).The default is stepwise.R2
.
model
The following model syntax operators are currently supported: =~, ~, ~~ and *. See below for details on default behavior, descriptions of how to specify the scaling indicator in latent variable models, and how to impose equality constraints on the parameter estimates.
Example using Syntax Operators
In the model below, 'L1 =~ Z1 + Z2 + Z3' indicates the latent variable L1 is measured by 3 indicators, Z1, Z2, and Z3. Likewise, L2 is measured by 3 indicators, Z4, Z5, and Z6. The statement 'L1 ~ L2' specifies latent variable L1 is regressed on latent variable L2. 'Z1 ~~ Z2' indicates the error of Z2 is allowed to covary with the error of Z3. The label LA3 appended to Z3 and Z6 in the measurement model constrains the factor loadings for Z3 and Z6 to equality. For additional details on constraints see Equality Constraints and Parameter Restrictions.
model <- ' L1 =~ Z1 + Z2 + LA3*Z3 L2 =~ Z4 + Z5 + LA3*Z6 L1 ~ L2 Z2 ~~ Z3 '
Scaling Indicators
Following the lavaan model syntax, latent variables are defined using
the =~ operator. For first order factors, the scaling indicator chosen is
the first observed variable on the RHS of an equation. For the model below
Z1
would be chosen as the scaling indicator for L1
and
Z4
would be chosen as the scaling indicator for L2
.
model <- ' L1 =~ Z1 + Z2 + Z3 L2 =~ Z4 + Z5 + Z6 '
Equality Constraints and Parameter Restrictions
Within- and across-equation equality constraints on the factor loading and
regression coefficients can be imposed directly in the model syntax. To
specify equality constraints between different parameters equivalent labels
should be prepended to the variable name using the * operator. For example,
we could constrain the factor loadings for the two non-scaling indicators
of L1
to equality using the following model syntax.
model <- ' L1 =~ Z1 + LA2*Z2 + LA2*Z3 L2 =~ Z4 + Z5 + Z6 '
Researchers also can constrain the factor loading and regression
coefficients to specific numeric values in a similar fashion. Below we
constrain the regression coefficient of L1
on L2
to
1
.
model <- ' L1 =~ Z1 + Z2 + Z3 L2 =~ Z4 + Z5 + Z6 L3 =~ Z7 + Z8 + Z9 L1 ~ 1*L2 + L3 '
Higher-order Factor Models
For example, in the model below, the scaling indicator for the
higher-order factor H1
is taken to be Z1
, the scaling
indicator that would have been assigned to the first lower-order factor
L1
. The intercepts for lower-order latent variables are set to zero,
by default
model <- ' H1 =~ L1 + L2 + L3 L1 =~ Z1 + Z2 + Z3 L2 =~ Z4 + Z5 + Z6 L3 =~ Z7 + Z8 + Z9 '
Model Defaults
In addition to those relationships specified in the model syntax
MIIVsem will automatically include the intercepts of any observed or
latent endogenous variable. The intercepts for any scaling indicators and
lower-order latent variables are set to zero by default. Covariances among
exogenous latent and observed variables are included when var.cov =
TRUE
. Where appropriate the covariances of the errors of latent and
observed dependent variables are automatically included in the model
specification. These defaults correspond to those used by lavaan and
auto = TRUE
, except that endogenous latent variable intercepts are
estimated by default, and the intercepts of scaling indicators are fixed to
zero.
Invalid Specifications
Certain model specifications are not currently supported. For example, the
scaling indicator of a latent variable is not permitted to cross-load on
another latent variable. In the model below Z1
, the scaling
indicator for L1, cross-loads on the latent variable L2
. Executing a
search on the model below will result in the warning: miivs: scaling
indicators with a factor complexity greater than 1 are not currently
supported.
model <- ' L1 =~ Z1 + Z2 + Z3 L2 =~ Z4 + Z5 + Z6 + Z1 '
In addition, MIIVsem does not currently support relations where the scaling indicator of a latent variable is also the dependent variable in a regression equation. The model below would not be valid under the current algorithm.
model <- ' L1 =~ Z1 + Z2 + Z3 Z1 ~ Z4 Z4 ~ Z5 + Z6 '
instruments
To utilize this option you must first define a list of instruments using the syntax displayed below. Here, the dependent variable for each equation is listed on the LHS of the ~ operator. In the case of latent variable equations, the dependent variable is the scaling indicator associated with that variable. The instruments are then given on the RHS, separated by + signs. The instrument syntax is then encloses in single quotes. For example,
customIVs <- ' y1 ~ z1 + z2 + z3 y2 ~ z4 + z5 '
After this list is defined, set the instruments
argument equal to
the name of the list of instruments (e.g. customIVs
). Note, that
instruments
are specified for an equation, and not for a specific
endogenous variable. If only a subset of dependent variables are listed in
the instruments argument, only those equations listed will be estimated.
If external or auxiliary instruments (instruments not otherwise included in
the model) are included the miiv.check
argument should be set to
FALSE
.
sample.cov
The user may provide a sample covariance matrix in lieu of raw data. The
rownames and colnames must contain the observed variable names indicated in
the model syntax. If sample.cov
is not NULL
the user must
also supply a vector of sample means (sample.mean
), and the number
of sample observations (sample.nobs
) from which the means and
covariances were calculated. If no vector of sample means is provided
intercepts will not be estimated. MIIVsem does not support bootstrap
standard errors or polychoric instrumental variable estimtation when the
sample moments, rather than raw data, are used as input.
sample.mean
A vector of length corresponding to the row and column dimensions
of the sample.cov
matrix. The names of sample.mean
must match those in the sample.cov
. If the user supplies a
covariance matrix but no vector of sample means intercepts will not
be estimated.
sample.cov.rescale
Default is TRUE
. If the sample covariance matrix provided
by the user should be internally rescaled by multiplying it with a
factor (N-1)/N.
estimator
The default estimator is 2SLS
. For equations with continuous
variables only and no restrictions the estimates are identical to those
described in Bollen (1996, 2001). If restrictions are present a restricted
MIIV-2SLS estimator is implemented using methods similar to those described
by Greene (2003) but adapted for moment based estimation. 2SLS coefficients
and overidentifcation tests are constructed using the sample moments for
increased computational efficiency.
If an equation contains ordered categorical variables, declared in the
ordered
argument, the PIV estimator described by Bollen and
Maydeu-Olivares (2007) is implemented. The PIV estimator does not currently
support exogenous observed predictors of endogenous categorical variables.
See details of the ordered
argument for more information about the
PIV estimator.
se
When se
is set to "boot"
or "bootstrap"
standard
errors are computed using a nonparametric bootstrap assuming an independent
random sample. If var.cov = TRUE
nonceonvergence may occur and any
datasets with impproper solutions will be recorded as such and discarded.
Bootstrapping is implemented using the boot by resampling the
observations in data
and refitting the model with the resampled
data. The number of bootstrap replications is set using the
bootstrap
argument, the default is 1000
. Here, the standard
errors are based on the standard deviation of successful bootstrap
replications. Note, the Sargan test statistic is calculated from the
original sample and is not a bootstrap-based estimate. When se
is
set to "standard"
standard errors for the MIIV-2SLS coefficients are
calculated using analytic expressions. For equations with categorical
endogenous variables, the asymptotic distribution of the coefficients is
obtained via a first order expansion where the matrix of partial
derivatives is evaluated at the sample polychoric correlations. For some
details on these standard errors see Bollen & Maydeu-Olivares (2007, p.
315). If var.cov = TRUE
only point estimates for the variance and
covariance estimates are calculated. To obtain standard errors for the
variance and covariance parameters we recommend setting se =
"bootstrap"
. Analytic standard errors for the variance covariance
parameters accounting for the first stage estimation have been derived and
will be available in future releases.
missing
There are two ways to handle missing data in MIIVsem. First, missing
data may be handled by listwise deletion (missing = "listwise"
), In
this case any row of data containing missing observation is excluded from
the analysis and the sample moments are adjusted accordingly. Estimation
then proceeds normally. The second option for handling missing data is
through a two-stage procedures missing = "twostage"
where consistent
estimates of the saturated populations means and covariance are obtained in
the first stage. These quantities are often referred to as the "EM means"
and "EM covariance matrix." In the second stage the saturated estimates are
used to calculate the MIIV-2SLS structural coefficients. Bootstrap standard
errors are recommended but will be computationally burdensome due to the
cost of calculating the EM-based moments at each bootstrap replication.
ordered
For equations containing ordered categorical variables MIIV-2SLS
coefficients are estimated using the approach outlined in Bollen
& Maydeu-Olivares (2007). The asymptotic distribution of the
these coefficients is obtained via a first order expansion where
the matrix of partial derivatives is evaluated at the sample
polychoric correlations. For some details on these
standard errors see Bollen & Maydeu-Olivares (2007, p. 315). If
var.cov = TRUE
only point estimates for the variance and
covariance estimates are calculated using the DWLS
estimator
in lavaan. To obtain standard errors for the variance and
covariance parameters we recommend the bootstrap approach.
Analytic standard errors for the variance covariance parameters
in the presence of endogenous categorical variables
will be available in future releases. Currently MIIVsem
does not support exogenous variables in equations with categorical
endogenous variables.
Sargan's Test of Overidentification
An essential ingredient in the MIIV-2SLS approach is the application of
overidentification tests when a given model specification leads to an excess
of instruments. Empirically, overidentification tests are used to evalulate
the assumption of orthogonality between the instruments and equation
residuals. Rejection of the null hypothesis implies a deficit in the logic
leading to the instrument selection. In the context of MIIV-2SLS this is the
model specification itself. By default, MIIVsem provides Sargan's
overidentification test (Sargan, 1958) for each overidentified equation in
the system. When cross-equation restrictions or missing data are present the
properties of the test are not known. When the system contains many equations
the sarg.adjust
option provides methods to adjust the p-values
associated with the Sargan test due to multiple comparisons. Defaults is
none
. For other options see p.adjust
.
Bollen, K. A. (1996). An Alternative 2SLS Estimator for Latent Variable Models. Psychometrika, 61, 109-121.
Bollen, K. A. (2001). Two-stage Least Squares and Latent Variable Models: Simultaneous Estimation and Robustness to Misspecifications. In R. Cudeck, S. Du Toit, and D. Sorbom (Eds.), Structural Equation Modeling: Present and Future, A Festschrift in Honor of Karl Joreskog (pp. 119-138). Lincoln, IL: Scientific Software.
Bollen, K. A., & Maydeu-Olivares, A. (2007). A Polychoric Instrumental Variable (PIV) Estimator for Structural Equation Models with Categorical Variables. Psychometrika, 72(3), 309.
Freedman, D. (1984). On Bootstrapping Two-Stage Least-Squares Estimates in Stationary Linear Models. The Annals of Statistics, 12(3), 827<U+2013>842.
Greene, W. H. (2000). Econometric analysis. Upper Saddle River, N.J: Prentice Hall.
Hayashi, F. (2000). Econometrics. Princeton, NJ: Princeton University Press
Sargan, J. D. (1958). The Estimation of Economic Relationships using Instrumental Variables. Econometrica, 26(3), 393<U+2013>415.
Savalei, V. (2010). Expected versus Observed Information in SEM with Incomplete Normal and Nonnormal Data. Psychological Methods, 15(4), 352<U+2013>367.
Savalei, V., & Falk, C. F. (2014). Robust Two-Stage Approach Outperforms Robust Full Information Maximum Likelihood With Incomplete Nonnormal Data. Structural Equation Modeling: A Multidisciplinary Journal, 21(2), 280<U+2013>302.
MIIVsemmiivs