copulaCorrection: Fitting Linear Models Endogeneous Regressors using Gaussian Copula

Description

Fits linear models with continuous or discrete endogeneous regressors using Gaussian copulas, method presented in Park and Gupta (2012). This is a statistical technique to address the endogeneity problem, where no external instrumental variables are needed. The important assumption of the model is that the endogeneous variables should NOT be normally distributed.

Usage

copulaCorrection(y,X,P,param,type, method, intercept, data)

Arguments

the vector or matrix containing the dependent variable.

the data frame or matrix containing the regressors of the model, both exogeneous and endogeneous. The last column/s should contain the endogenous variable/s.

the matrix.vector containing the endogenous variables.

param

the vector of initial values for the parameters of the model to be supplied to the optimization algorithm. The parameters to be estimated are theta = {b,a,rho,sigma}, where b are the parameters of the exogenous variables, a is the parameter of the endogenous variable, rho is the parameter for the correlation between the error and the endogenous regressor, while sigma is the standard deviation of the structural error.

type

the type of the endogenous regressor/s. It can take two values, "continuous" or "discrete".

method

the method used for estimating the model. It can take two values, "1" or "2", where "1" is the ML approach described in Park and Gupta (2012), and "2" is the equivalent OLS approach described in the same paper. "1" can be applied when there is just a single, continous endogenous variable. With one discrete or more than one continuous endogenous regressors, the second method is applied by default.

intercept

optional parameter. The model is estimated by default with intercept. If no intercept is desired or the regressors matrix X contains already a column of ones, intercept should be given the value "no".

data

data frame or matrix containing the variables of the model.

Value

Depending on the method and the type of the variables, it returns the optimal values of the parameters and their standard errors. When the method one is used, the standard errors returned are obtained bootstrapping over 10 samples. If more bootstraping samples are desired, the standard errors can be obtained using the boots function from the same package. The following are being returned and can be saved:

coefficients

the estimated coefficients.

standard errors

the corresponding estimated coefficients standard errors.

fitted.values

the fitted values.

residuals

the estimated residuals.

logLik

the estimated log likelihood value in the case of method 1.

AIC

Akaike Information Criterion in the case of method 1.

BIC

Bayesian Information Criterion in the case of method 1.

Details

The maximum likelihood estimation is performed by the "BFGS" algorithm. When there are two endogenous regressors, there is no need for initial parameters since the method applied is by default the augmented OLS, which can be specified by using method two - "method="2"".

References

Park, S. and Gupta, S., (2012), 'Handling Endogeneous Regressors by Joint Estimation Using Copulas', Marketing Science, 31(4), 567-86.

Examples

Run this code

#load dataset dataCopC1, where P is endogenous, continuous and not normally distributed

data(dataCopC1)
y <- dataCopC1[,1]
X <- dataCopC1[,2:5]
P <- dataCopC1[,5]
## Not run: ------------------------------------
# c1 <- copulaCorrection(y, X, P, type = "continuous", method = "1", intercept=FALSE)
# summary(c1)
## ---------------------------------------------

# an alternative model can be obtained using "method ="2"".
c12 <- copulaCorrection(y, X, P, type = "continuous", method = "2", intercept=FALSE)
summary(c12)

# with 2 endogeneous regressors no initial parameters are needed, the default is the augmented OLS.
data(dataCopC2)
y <- dataCopC2[,1]
X <- dataCopC2[,2:6]
P <- dataCopC2[,5:6]
c2 <- copulaCorrection(y, X, P, type = "continuous" ,method="2", intercept=FALSE)
summary(c2)

# load dataset with 1 discrete endogeneous variable. 
# having more than 1 discrete endogenous regressor is also possible
data(dataCopDis)
y <- dataCopDis[,1]
X <- dataCopDis[,2:5]
P <- dataCopDis[,5]
c3 <- copulaCorrection(y, X, P, type = "discrete", intercept=FALSE, data = dataCopDis)
summary(c3)

Run the code above in your browser using DataLab