miive: Model-implied instrumental variable (MIIV) estimation

Description

Estimate structural equation models using model-implied instrumental variables (MIIVs).

Usage

miive(
  model = model,
  data = NULL,
  instruments = NULL,
  sample.cov = NULL,
  sample.mean = NULL,
  sample.nobs = NULL,
  sample.cov.rescale = TRUE,
  estimator = "2SLS",
  se = "standard",
  bootstrap = 1000L,
  boot.ci = "norm",
  missing = "listwise",
  est.only = FALSE,
  var.cov = FALSE,
  miiv.check = TRUE,
  ordered = NULL,
  sarg.adjust = "none",
  overid.degree = NULL,
  overid.method = "stepwise.R2"
)

Arguments

model

A model specified using lavaan model syntax or a miivs object returned by the miivs function. See Details for more information about permissible operators and example model syntax.

data

A data frame, list or environment or an object coercible by as.data.frame to data frame. The most common application is to supply a data.frame.

instruments

This allows user to specify the instruments for each equation. See Details and the miivs.out argument of summary.miivs for more information on the correct input format. External (auxiliary) instruments can be supplied, however, the miiv.check argument must be set to FALSE. In the typical application, the program will choose the MIIVs for each equation based on the model specification. To view the model implied instruments after estimation see the eq.info argument of summary.miive.

sample.cov

Numeric matrix. A sample variance-covariance matrix. The rownames and colnames attributes must contain all the observed variable names indicated in the model syntax.

sample.mean

A sample mean vector. If sample.cov is provided and the sample.mean argument is NULL, intercepts for all endogenous variables will not be estimated.

sample.nobs

Number of observations in the full data frame.

sample.cov.rescale

If TRUE, the sample covariance matrix provided by the user is internally rescaled by multiplying it with a factor (N-1)/N.

estimator

Options "2SLS" or "GMM" for estimating the model parameters. Default is "2SLS". Currently, only 2SLS is supported.

If "standard", asymptotic standard errors are computed. If "bootstrap" (or "boot"), bootstrap standard errors are computed.

bootstrap

Number of bootstrap draws, if bootstrapping is used. The default is 1000

boot.ci

Method for calculating bootstrap confidence intervals. Options are normal approximation ("norm"), basic bootstrap interval ("basic"), percentile interval ("perc"), and adjusted bootstrap percentile ("bca"). The default is normal approximation. See boot.ci for more information.

missing

Default is "listwise" however, a maximum likelihood related missing data method called "twostage" is also available. See Details below on missing for more information.

est.only

If TRUE, only the coefficients are returned.

var.cov

If TRUE, variance and covariance parameters are estimated.

miiv.check

Default is TRUE. miiv.check provides a check to determine whether user-upplied instruments are implied by the model specification (e.g. valid MIIVs). When auxiliary or external instruments are provided miiv.check should be set to FALSE.

ordered

A vector of variable names to be treated as ordered factors in generating the polychoric correlation matrix and subsequent PIV estimates. See details on ordered below for more information.

sarg.adjust

Adjusment methods used to adjust the p-values associated with the Sargan test due to multiple comparisons. Defaults is none. For options see p.adjust.

overid.degree

A numeric value indicating the degree of overidentification to be used in estimation.

overid.method

The method by which excess MIIVs should be pruned to satisfy the overid.degree. Options include random (minimum.eigen) or stepwise R2. (stepwise.R2).The default is stepwise.R2.

Details

model
The following model syntax operators are currently supported: =~, ~, ~~ and *. See below for details on default behavior, descriptions of how to specify the scaling indicator in latent variable models, and how to impose equality constraints on the parameter estimates.
Example using Syntax Operators
In the model below, 'L1 =~ Z1 + Z2 + Z3' indicates the latent variable L1 is measured by 3 indicators, Z1, Z2, and Z3. Likewise, L2 is measured by 3 indicators, Z4, Z5, and Z6. The statement 'L1 ~ L2' specifies latent variable L1 is regressed on latent variable L2. 'Z1 ~~ Z2' indicates the error of Z2 is allowed to covary with the error of Z3. The label LA3 appended to Z3 and Z6 in the measurement model constrains the factor loadings for Z3 and Z6 to equality. For additional details on constraints see Equality Constraints and Parameter Restrictions.
```
model <- '
     L1 =~ Z1 + Z2 + LA3*Z3
     L2 =~ Z4 + Z5 + LA3*Z6
     L1  ~ L2
     Z2 ~~ Z3
  '
```
Scaling Indicators
Following the lavaan model syntax, latent variables are defined using the =~ operator. For first order factors, the scaling indicator chosen is the first observed variable on the RHS of an equation. For the model below Z1 would be chosen as the scaling indicator for L1 and Z4 would be chosen as the scaling indicator for L2.
```
model <- '
     L1 =~ Z1 + Z2 + Z3
     L2 =~ Z4 + Z5 + Z6
  '
```
Equality Constraints and Parameter Restrictions
Within- and across-equation equality constraints on the factor loading and regression coefficients can be imposed directly in the model syntax. To specify equality constraints between different parameters equivalent labels should be prepended to the variable name using the * operator. For example, we could constrain the factor loadings for the two non-scaling indicators of L1 to equality using the following model syntax.
```
model <- '
     L1 =~ Z1 + LA2*Z2 + LA2*Z3
     L2 =~ Z4 + Z5 + Z6
  '
```
Researchers also can constrain the factor loading and regression coefficients to specific numeric values in a similar fashion. Below we constrain the regression coefficient of L1 on L2 to 1.
```
model <- '
     L1 =~ Z1 + Z2 + Z3
     L2 =~ Z4 + Z5 + Z6
     L3 =~ Z7 + Z8 + Z9
     L1  ~ 1*L2 + L3
  '
```
Higher-order Factor Models
For example, in the model below, the scaling indicator for the higher-order factor H1 is taken to be Z1, the scaling indicator that would have been assigned to the first lower-order factor L1. The intercepts for lower-order latent variables are set to zero, by default
```
model <- '
     H1 =~ L1 + L2 + L3
     L1 =~ Z1 + Z2 + Z3
     L2 =~ Z4 + Z5 + Z6
     L3 =~ Z7 + Z8 + Z9
  '
```
Model Defaults
In addition to those relationships specified in the model syntax MIIVsem will automatically include the intercepts of any observed or latent endogenous variable. The intercepts for any scaling indicators and lower-order latent variables are set to zero by default. Covariances among exogenous latent and observed variables are included when var.cov = TRUE. Where appropriate the covariances of the errors of latent and observed dependent variables are automatically included in the model specification. These defaults correspond to those used by lavaan and auto = TRUE, except that endogenous latent variable intercepts are estimated by default, and the intercepts of scaling indicators are fixed to zero.
Invalid Specifications
Certain model specifications are not currently supported. For example, the scaling indicator of a latent variable is not permitted to cross-load on another latent variable. In the model below Z1, the scaling indicator for L1, cross-loads on the latent variable L2. Executing a search on the model below will result in the warning: miivs: scaling indicators with a factor complexity greater than 1 are not currently supported.
```
model <- '
    L1 =~ Z1 + Z2 + Z3
    L2 =~ Z4 + Z5 + Z6 + Z1
  '
```
In addition, MIIVsem does not currently support relations where the scaling indicator of a latent variable is also the dependent variable in a regression equation. The model below would not be valid under the current algorithm.
```
model <- '
    L1 =~ Z1 + Z2 + Z3
    Z1  ~ Z4
    Z4  ~ Z5 + Z6
  '
```
instruments
To utilize this option you must first define a list of instruments using the syntax displayed below. Here, the dependent variable for each equation is listed on the LHS of the ~ operator. In the case of latent variable equations, the dependent variable is the scaling indicator associated with that variable. The instruments are then given on the RHS, separated by + signs. The instrument syntax is then encloses in single quotes. For example,
```
customIVs <- '
     y1 ~ z1 + z2 + z3
     y2 ~ z4 + z5
  '
  
```
After this list is defined, set the instruments argument equal to the name of the list of instruments (e.g. customIVs). Note, that instruments are specified for an equation, and not for a specific endogenous variable. If only a subset of dependent variables are listed in the instruments argument, only those equations listed will be estimated. If external or auxiliary instruments (instruments not otherwise included in the model) are included the miiv.check argument should be set to FALSE.
sample.cov
The user may provide a sample covariance matrix in lieu of raw data. The rownames and colnames must contain the observed variable names indicated in the model syntax. If sample.cov is not NULL the user must also supply a vector of sample means (sample.mean), and the number of sample observations (sample.nobs) from which the means and covariances were calculated. If no vector of sample means is provided intercepts will not be estimated. MIIVsem does not support bootstrap standard errors or polychoric instrumental variable estimtation when the sample moments, rather than raw data, are used as input.
sample.mean
A vector of length corresponding to the row and column dimensions of the sample.cov matrix. The names of sample.mean must match those in the sample.cov. If the user supplies a covariance matrix but no vector of sample means intercepts will not be estimated.
sample.cov.rescale
Default is TRUE. If the sample covariance matrix provided by the user should be internally rescaled by multiplying it with a factor (N-1)/N.
estimator
The default estimator is 2SLS. For equations with continuous variables only and no restrictions the estimates are identical to those described in Bollen (1996, 2001). If restrictions are present a restricted MIIV-2SLS estimator is implemented using methods similar to those described by Greene (2003) but adapted for moment based estimation. 2SLS coefficients and overidentifcation tests are constructed using the sample moments for increased computational efficiency.
If an equation contains ordered categorical variables, declared in the ordered argument, the PIV estimator described by Bollen and Maydeu-Olivares (2007) is implemented. The PIV estimator does not currently support exogenous observed predictors of endogenous categorical variables. See details of the ordered argument for more information about the PIV estimator.
se When se is set to "boot" or "bootstrap" standard errors are computed using a nonparametric bootstrap assuming an independent random sample. If var.cov = TRUE nonceonvergence may occur and any datasets with impproper solutions will be recorded as such and discarded. Bootstrapping is implemented using the boot by resampling the observations in data and refitting the model with the resampled data. The number of bootstrap replications is set using the bootstrap argument, the default is 1000. Here, the standard errors are based on the standard deviation of successful bootstrap replications. Note, the Sargan test statistic is calculated from the original sample and is not a bootstrap-based estimate. When se is set to "standard" standard errors for the MIIV-2SLS coefficients are calculated using analytic expressions. For equations with categorical endogenous variables, the asymptotic distribution of the coefficients is obtained via a first order expansion where the matrix of partial derivatives is evaluated at the sample polychoric correlations. For some details on these standard errors see Bollen & Maydeu-Olivares (2007, p. 315). If var.cov = TRUE only point estimates for the variance and covariance estimates are calculated. To obtain standard errors for the variance and covariance parameters we recommend setting se = "bootstrap". Analytic standard errors for the variance covariance parameters accounting for the first stage estimation have been derived and will be available in future releases.
missing There are two ways to handle missing data in MIIVsem. First, missing data may be handled by listwise deletion (missing = "listwise"), In this case any row of data containing missing observation is excluded from the analysis and the sample moments are adjusted accordingly. Estimation then proceeds normally. The second option for handling missing data is through a two-stage procedures missing = "twostage" where consistent estimates of the saturated populations means and covariance are obtained in the first stage. These quantities are often referred to as the "EM means" and "EM covariance matrix." In the second stage the saturated estimates are used to calculate the MIIV-2SLS structural coefficients. Bootstrap standard errors are recommended but will be computationally burdensome due to the cost of calculating the EM-based moments at each bootstrap replication.
ordered For equations containing ordered categorical variables MIIV-2SLS coefficients are estimated using the approach outlined in Bollen & Maydeu-Olivares (2007). The asymptotic distribution of the these coefficients is obtained via a first order expansion where the matrix of partial derivatives is evaluated at the sample polychoric correlations. For some details on these standard errors see Bollen & Maydeu-Olivares (2007, p. 315). If var.cov = TRUE only point estimates for the variance and covariance estimates are calculated using the DWLS estimator in lavaan. To obtain standard errors for the variance and covariance parameters we recommend the bootstrap approach. Analytic standard errors for the variance covariance parameters in the presence of endogenous categorical variables will be available in future releases. Currently MIIVsem does not support exogenous variables in equations with categorical endogenous variables.

Sargan's Test of Overidentification

An essential ingredient in the MIIV-2SLS approach is the application of overidentification tests when a given model specification leads to an excess of instruments. Empirically, overidentification tests are used to evalulate the assumption of orthogonality between the instruments and equation residuals. Rejection of the null hypothesis implies a deficit in the logic leading to the instrument selection. In the context of MIIV-2SLS this is the model specification itself. By default, MIIVsem provides Sargan's overidentification test (Sargan, 1958) for each overidentified equation in the system. When cross-equation restrictions or missing data are present the properties of the test are not known. When the system contains many equations the sarg.adjust option provides methods to adjust the p-values associated with the Sargan test due to multiple comparisons. Defaults is none. For other options see p.adjust.

References

Bollen, K. A. (1996). An Alternative 2SLS Estimator for Latent Variable Models. Psychometrika, 61, 109-121.

Bollen, K. A. (2001). Two-stage Least Squares and Latent Variable Models: Simultaneous Estimation and Robustness to Misspecifications. In R. Cudeck, S. Du Toit, and D. Sorbom (Eds.), Structural Equation Modeling: Present and Future, A Festschrift in Honor of Karl Joreskog (pp. 119-138). Lincoln, IL: Scientific Software.

Bollen, K. A., & Maydeu-Olivares, A. (2007). A Polychoric Instrumental Variable (PIV) Estimator for Structural Equation Models with Categorical Variables. Psychometrika, 72(3), 309.

Freedman, D. (1984). On Bootstrapping Two-Stage Least-Squares Estimates in Stationary Linear Models. The Annals of Statistics, 12(3), 827<U+2013>842.

Greene, W. H. (2000). Econometric analysis. Upper Saddle River, N.J: Prentice Hall.

Hayashi, F. (2000). Econometrics. Princeton, NJ: Princeton University Press

Sargan, J. D. (1958). The Estimation of Economic Relationships using Instrumental Variables. Econometrica, 26(3), 393<U+2013>415.

Savalei, V. (2010). Expected versus Observed Information in SEM with Incomplete Normal and Nonnormal Data. Psychological Methods, 15(4), 352<U+2013>367.

Savalei, V., & Falk, C. F. (2014). Robust Two-Stage Approach Outperforms Robust Full Information Maximum Likelihood With Incomplete Nonnormal Data. Structural Equation Modeling: A Multidisciplinary Journal, 21(2), 280<U+2013>302.

Description

Usage

Arguments

Details

References

See Also