latentIV: Fitting Linear Models with one Endogenous Regressor using Latent Instrumental Variables

Description

Fits linear models with one endogenous regressor and no additional explanatory variables using the latent instrumental variable approach presented in Ebbes,P., Wedel,M., B\"ockenholt, U., and Steerneman, A. G. M. (2005). This is a statistical technique to address the endogeneity problem where no external instrumental variables are needed. The important assumption of the model is that the latent variables are discrete with at least two groups with different means and the structural error is normally distributed.

Usage

latentIV(formula, param = NULL, data)

Arguments

formula

an object of type 'formula': a symbolic description of the model to be fitted. Example var1 ~ var2, where var1 is a vector containing the dependent variable, while var2 is a vector containing the endogenous variable. An intercept is included by default.

param

a vector of initial values for the parameters of the model to be supplied to the optimization algorithm. In any model there are eight parameters. The first parameter is the intercept, then the coefficient of the endogenous variable followed by the means of the two groups of the latent IV (they need to be different, otherwise model is not identified), then the next three parameters are for the variance-covariance matrix. The last parameter is the probability of being in group 1. When not provided, initial paramameters values are set equal to the OLS coefficients, the two group means are set to be equal to mean(P) and mean(P) + sd(P), the variance-covariance matrix has all elements equal to 1 while probG1 is set to equal 0.5.

data

data frame or list containing the variables of the model.

Value

Returns the optimal values of the parameters as computed by maximum likelihood using BFGS algorithm.

coefficients

the value of the parameters for the intercept and the endogenous regressor as computed with maximum likelihood.

fitted.values

the fitted values.

means

the value of the parameters for the means of the two categories/groups of the latent instrumental variable.

sigma

the variance-covariance matrix sigma, where on the main diagonal are the variances of the structural error and that of the endogenous regressor and the off-diagonal terms are equal to the covariance between the errors.

probG1

the probability of being in group one. Since the model assumes that the latent instrumental variable has two groups, 1-probG1 gives the probability of group 2.

value

the value of the log-likelihood function corresponding to the optimal parameters.

AIC

Akaike Information Criterion.

BIC

Bayesian Information Criterion.

convcode

an integer code, the same as the output returned by optimx. 0 indicates successful completion. A possible error code is 1 which indicates that the iteration limit maxit had been reached.

hessian

a symmetric matrix giving an estimate of the Hessian at the solution found.

Details

Let's consider the model: $$Y_{t} = \beta_{0} + \alpha P_{t} + \epsilon_{t}$$ $$P_{t}=\pi^{'}Z_{t} + \nu_{t}$$ where t = 1,..,T indexes either time or cross-sectional units, $Y_{t}$ is the dependent variable, $P_{t}$ is a k x 1 continuous, endogenous regressor, $\epsilon_{t}$ is a structural error term with mean zero and $E(\epsilon^{2})=\sigma^{2}_{\epsilon}$, $\alpha$ and $\beta$ are model parameters. $Z_{t}$ is a l x 1 vector of instruments, and $\nu_{t}$ is a random error with mean zero and $E(\nu^{2}) = \sigma^{2}_{\nu}$. The endogeneity problem arises from the correlation of P and $\epsilon_{t}$ through $E(\epsilon\nu) = \sigma_{\epsilon\nu}$. latentIV considers $Z_{t}^{'}$ to be a latent, discrete, exogenous variable with an unknown number of groups m and $\pi$ is a vector of group means. It is assumed that Z is independent of the error terms $\epsilon$ and $\nu$ and that it has at least two groups with different means. The structural and random errors are considered normally distributed with mean zero and variance-covariance matrix $\Sigma$: $$\Sigma = \left( \begin{array}{ccc} \sigma_{\epsilon}^{2} & \sigma_{\epsilon\nu}\\ \sigma_{\epsilon\nu} & \sigma_{\nu}^{2} \end{array}\right)$$ The identification of the model lies in the assumption of the non-normality of $P_{t}$, the discreteness of the unobserved instruments and the existence of at least two groups with different means. The method has been programmed such that the latent variable has two groups. Ebbes et al.(2005) show in a Monte Carlo experiement that even if the true number of the categories of the instrument is larger than two, latentIV estimates are approximately consistent. Besides, overfitting in terms of the number of groups/categories reduces the degrees of freedom and leads to efficiency loss. When provided by the user, the initial parameter values for the two group means have to be different, otherwise the model is not identified. For a model with additonal explanatory variables a Bayesian approach is needed, since in a frequentist approach identification issues appear. The optimization algorithm used is BFGS.

References

Ebbes, P., Wedel,M., B\"ockenholt, U., and Steerneman, A. G. M. (2005). 'Solving and Testing for Regressor-Error (in)Dependence When no Instrumental Variables are Available: With New Evidence for the Effect of Education on Income'. Quantitative Marketing and Economics, 3:365--392.

Examples

Run this code

# load data
data(dataLatentIV)
# function call without any initial parameter values 
l  <- latentIV(y ~ P, data = dataLatentIV)
summary(l)
# function call with initial parameter values given by the user
l1 <- latentIV(y ~ P, c(2.9,-0.85,0,0.1,1,1,1,0.5), data = dataLatentIV)
summary(l1)

Run the code above in your browser using DataLab