Let's consider the model:
Yt=β0+αPt+εt
Pt=π'Zt+νt
where \(t = 1,..,T\) indexes either time or cross-sectional units, Yt is the dependent variable,
Pt is a k x 1
continuous, endogenous regressor,
εt is a structural error term with mean zero
and E(ε2)=σε2,
\(\alpha\) and β0 are model parameters.
Z;t is a l x 1
vector of instruments,
and νt is a random error with mean zero and
E(ν2)=σν2.
The endogeneity problem arises from the correlation of \(P\) and εt
through E(εν)=σεν
latentIV
considers Zt' to be a latent, discrete, exogenous variable with an unknown number of groups \(m\) and \(\pi\) is a vector of group means.
It is assumed that \(Z\) is independent of the error terms \(\epsilon\) and \(\nu\) and that it has at least two groups with different means.
The structural and random errors are considered normally distributed with mean zero and variance-covariance matrix \(\Sigma\):
Σ=(σε2, σ02,
σ02, σν2)
The identification of the model lies in the assumption of the non-normality of
Pt, the discreteness of the unobserved instruments and the existence of
at least two groups with different means.
The method has been implemented such that the latent variable has two groups. Ebbes et al.(2005) show in a Monte Carlo experiment that
even if the true number of the categories of the instrument is larger than two, estimates are approximately consistent. Besides, overfitting in terms
of the number of groups/categories reduces the degrees of freedom and leads to efficiency loss. For a model with additional explanatory variables a Bayesian approach is needed, since
in a frequentist approach identification issues appear.
Identification of the parameters relies on the distributional assumptions of the latent instruments as well as that of
the endogenous regressor Pt.
Specifically, the endogenous regressor should have a non-normal distribution while the unobserved instruments, \(Z\), should be discrete and have at least two groups with different means Ebbes, Wedel, and Böckenholt (2009).
A continuous distribution for the instruments leads to an unidentified model, while a normal distribution of the endogenous regressor gives rise to inefficient estimates.
Additional parameters used during model fitting and printed in summary
are:
- pi1
The instrumental variables \(Z\) are assumed to be divided into two groups. pi1
represents the estimated group mean of the first group.
- pi2
The estimated group mean of the second group of the instrumental variables \(Z\).
- theta5
The probability of being in the first group of the instruments.
- theta6
The variance, σε2
- theta7
The covariance, σεν
- theta8
The variance, σν2