latentIV(formula, param = NULL, data)
var1 ~ var2
, where var1
is a vector
containing the dependent variable, while var2
is a vector containing the endogenous variable. An intercept is included by default.mean(P)
and mean(P) + sd(P)
, the
variance-covariance matrix has all elements equal to 1 while probG1
is set to equal 0.5.1-probG1
gives the probability of group 2.optimx
. 0 indicates successful completion. A possible error code is 1 which indicates that the iteration
limit maxit had been reached.t = 1,..,T
indexes either time or cross-sectional units, \(Y_{t}\) is the dependent variable, \(P_{t}\) is a k x 1
continuous, endogenous regressor,
\(\epsilon_{t}\) is a structural error term with mean zero and \(E(\epsilon^{2})=\sigma^{2}_{\epsilon}\), \(\alpha\) and \(\beta\)
are model parameters. \(Z_{t}\) is a l x 1
vector of instruments, and \(\nu_{t}\) is a random error with mean zero and \(E(\nu^{2}) = \sigma^{2}_{\nu}\).
The endogeneity problem arises from the correlation of P
and \(\epsilon_{t}\) through \(E(\epsilon\nu) = \sigma_{\epsilon\nu}\). latentIV considers \(Z_{t}^{'}\) to be a latent, discrete, exogenous variable with an unknown number of groups m
and \(\pi\) is a vector of group means.
It is assumed that Z
is independent of the error terms \(\epsilon\) and \(\nu\) and that it has at least two groups with different means.
The structural and random errors are considered normally distributed with mean zero and variance-covariance matrix \(\Sigma\):
$$\Sigma = \left(
\begin{array}{ccc}
\sigma_{\epsilon}^{2} & \sigma_{\epsilon\nu}\\
\sigma_{\epsilon\nu} & \sigma_{\nu}^{2}
\end{array}\right)$$
The identification of the model lies in the assumption of the non-normality of \(P_{t}\), the discreteness of the unobserved instruments and the existence of
at least two groups with different means. The method has been programmed such that the latent variable has two groups. Ebbes et al.(2005) show in a Monte Carlo experiement that
even if the true number of the categories of the instrument is larger than two, latentIV estimates are approximately consistent. Besides, overfitting in terms
of the number of groups/categories reduces the degrees of freedom and leads to efficiency loss. When provided by the user, the initial parameter values
for the two group means have to be different, otherwise the model is not identified. For a model with additonal explanatory variables a Bayesian approach is needed, since
in a frequentist approach identification issues appear. The optimization algorithm used is BFGS.# load data
data(dataLatentIV)
# function call without any initial parameter values
l <- latentIV(y ~ P, data = dataLatentIV)
summary(l)
# function call with initial parameter values given by the user
l1 <- latentIV(y ~ P, c(2.9,-0.85,0,0.1,1,1,1,0.5), data = dataLatentIV)
summary(l1)
Run the code above in your browser using DataLab