counterfactual: Estimation and Inference Methods for Counterfactual Analysis

Description

Implements the estimation and inference methods for counterfactual analysis described in Chernozhukov, Fernandez-Val and Melly (2013). Counterfactual reports point estimates, pointwise confidence bands, and simultaneous confidence bands for function-valued quantile effects (QE). It also reports p-values for functional hypotheses such as no effect, constant effect and stochastic dominance. The uniform confidence bands and p-values are obtained by inverting Kolmogorov-Smirnov (KS) and Cramer-von-Misses-Smirnov (CMS) statistics. The distribution of these statistics is approximated by empirical or weighted bootstrap. We recommend the use of weighted bootstrap when the covariates X include discrete components with small cell sizes.

Usage

counterfactual(formula, data, weights, na.action = na.exclude, 
  group, treatment =FALSE, decomposition = FALSE, counterfactual_var,
  transformation = FALSE, quantiles = c(1:9)/10,  
  method = "qr", trimming = 0.005, nreg = 100,
  scale_variable, counterfactual_scale_variable, censoring = 0, 
  right = FALSE, nsteps = 3, firstc = 0.1, secondc = 0.05,  
  noboot = FALSE, weightedboot = FALSE, seed = 8, robust = FALSE,  
  reps = 100, alpha = 0.05, first = 0.1, last = 0.9, cons_test = 0, 
  printdeco = TRUE, sepcore = FALSE, ncore = 1)

Arguments

formula

a formula object, with the response Y on the left of a ~ operator, and the covariate terms X, separated by + operators, on the right.

data

a data.frame in which to interpret the variables named in the formula, or in the weights argument. If this is missing, then the variables in the formula should be on the search list.

weights

vector of observation weights.

na.action

a function to filter missing data. The default (with na.fail) is to create an error if any missing values are found. A possible alternative is na.omit, which deletes observations that contain one or more missing values.

quantiles

quantile indexes of interest for the QE. It should be a vector of values between 0 and 1 with default c(1:9)/10.

group

name of a binary variable defining the reference population (value 0) and counterfactual population (value 1).

treatment

logical: if TRUE, then computes the structure or treatment effect (only useful when group is specified); if FALSE, then computes the composition effect.

decomposition

logical: if TRUE, then computes the structure effect, composition effect and total effect; if FALSE, then computes the structure effect (only useful when group is specified, and treatment=TRUE).

transformation

logical: if TRUE, then the counterfactual distribution of X is generated by transformation of the distribution of X in the reference population.

counterfactual_var

selects the values of X in the counterfactual population (only useful when group is not specified).

method

selects the model to be used to estimate the conditional distribution. The following methods have been implemented: qr (quantile regression, the default), loc (location shift), locsca (location scale shift), cqr (censored quantile regression), cox (duration regression), logit (logit distribution regression), probit (probit distribution regression), and lpm (linear probability model).

trimming

value between 0 and 0.5 specifying the amount of trimming to avoid tail estimation in qr method; default is 0.005.

nreg

sets the number of regressions estimated to approximate the conditional distribution; default is 100.

scale_variable

selects the components of X that affect the scale in the locsca method.

counterfactual_scale_variable

selects the counterfactual values of the components of X that affect the scale in the locsca method (only useful when counterfactual_var is specified).

censoring

variable specifying the censoring point for each observations (only useful when method=cqr).

right

logical: if TRUE, then indicates that the variable is right-censored; if TRUE, then indicates that the variable is left-censored (only useful when method=cqr).

nsteps

selects the number of steps performed in the cqr method; default and minimum is 3 (only useful when method=cqr).

firstc

selects the percentage of observations thrown out during the second step in the cqr method; default is 0.1 (only useful when method=cqr).

secondc

selects the percentage of observations thrown out during the third and further steps of the cqr method; default is 0.05 (only useful when method=cqr).

noboot

logical: if TRUE, then suppresses the bootstrap; if FALSE, the default, then runs the bootstrap.

weightedboot

logical: if TRUE, then implements weighted bootstrap with standard exponential weights; if FALSE, the default, then implements empirical bootstrap (only useful when noboot=FALSE).

seed

sets the seed for the random number generation (only useful when noboot=FALSE).

robust

logical: if TRUE, then uses the bootstrap interquartile range to estimate standard errors in the KS and CMS statistics; if FALSE, the default, then uses the bootstrap standard deviation to estimate standard errors in the KS and CMS statistics (only useful when noboot=FALSE).

reps

number of bootstrap replications; default is 100 (only useful when noboot=FALSE).

alpha

a real number between 0 and 1 reflecting the desired significance level for the confidence bands and hypotheses tests (only useful when noboot=FALSE).

first

sets the lowest quantile that is used for functional inference; default is 0.1 (only useful when noboot=FALSE).

last

sets the highes quantile that is used for functional inference; default is 0.9 (only useful when noboot=FALSE).

cons_test

adds tests of the null hypothesis that the QEs = cons_test at all the specified quantiles (only useful when noboot=FALSE).

printdeco

logical: if FALSE, then suppresses table of results.

sepcore

logical: if TRUE, then multiple cores are used for parallel computing.

ncore

number of cores used for parallel computing (only useful when sepcore=TRUE).

Value

Return a list of results

quantiles

quantile indexes of interest for the QE.

structure_effect

a vector with the estimated structure effects at the quantile indexes specified with quantiles. This vector is reported when group is specified and treatment=TRUE.

composition_effect

a vector with the estimated composition effects at the quantile indexes specified with quantiles. If group is specified, then this vector is reported when treatment=FALSE, or treatment=TRUE and decomposition=TRUE.

total_effect

a vector with the estimated total effects at the quantile indexes specified with quantiles. This vector is reported when group is specified, treatment=TRUE and decomposition=TRUE.

sample_quantile_ref0

a matrix with 4 columns. The columns contain the point estimates, standard errors, uniform lower end of confidence band, and uniform upper end of confidence band for the quantiles of Y in the observed distribution estimated using sample quantiles at the quantile indexes specified with quantiles. If group is specified, then this matrix is reported when treatment=FALSE, or treatment=TRUE and decomposition=TRUE.

model_quantile_ref0

a matrix with 4 columns. The columns contain the point estimates, standard errors, uniform lower end of confidence band, and uniform upper end of confidence band for the quantiles of Y in the observed distribution estimated using the conditional model at the quantile indexes specified with quantiles. If group is specified, then this matrix is reported when treatment=FALSE, or treatment=TRUE and decomposition=TRUE.

model_quantile_counter

a matrix with 4 columns. The columns contain the point estimates, standard errors, uniform lower end of confidence band, and uniform upper end of confidence band for the quantiles of Y in the counterfactual distribution estimated using the conditional model at the quantile indexes specified with quantiles.

sample_quantile_ref1

a matrix with 4 columns. The columns contain the point estimates, standard errors, uniform lower end of confidence band, and uniform upper end of confidence band for the quantiles of Y in the observed distribution of the population defined by $group=1$ estimated using sample quantiles at the quantile indexes specified with quantiles. This matrix is reported when group is specified and treatment=TRUE.

model_quantile_ref1

a matrix with 4 columns. The columns contain the point estimates, standard errors, uniform lower end of confidence band, and uniform upper end of confidence band for the quantiles of Y in the observed distribution of the population defined by $group=1$ estimated using the conditional model at the quantile indexes specified with quantiles. This matrix is reported when group is specified and treatment=TRUE.

nreg

number of regressions estimated to approximate the conditional distribution.

resSE

a matrix with 6 columns. The columns contain the point estimates, standard errors, pointwise lower end of confidence band, pointwise upper end of confidence band, uniform lower end of confidence band, and uniform upper end of confidence band for the structure or treatment quantile effect at the quantile indexes specified with quantiles. This matrix is reported when group is specified and treatment=TRUE.

testSE

a matrix with 2 columns including the p-values based on the KS and CMS statistics for several functional hypotheses on the structure or treatment effect. The first row tests the null-hypothesis of correct specification of the conditional model. The second row tests the null hypothesis that the change in the distribution of the covariates has no effect. The following rows tests the null hypotheses of constant QE, positive QE, and negative QE. An additional row testing the null hypotheses of constant QE (but at a different level than 0) is added if the option cons_test is specified. This matrix is reported when group is specified and treatment=TRUE.

resCE

a matrix with 6 columns. The columns contain the point estimates, standard errors, pointwise lower end of confidence band, pointwise upper end of confidence band, uniform lower end of confidence band, and uniform upper end of confidence band for the composition quantile effect at the quantile indexes specified with quantiles. If group is specified, then this matrix is reported when treatment=FALSE, or treatment=TRUE and decomposition=TRUE.

testCE

a matrix with 2 columns including the p-values based on the KS and CMS statistics for several functional hypotheses on the composition effect. The first row tests the null-hypothesis of correct specification of the conditional model. The second row tests the null hypothesis that the change in the distribution of the covariates has no effect. The following rows tests the null hypotheses of constant QE, positive QE, and negative QE. An additional row testing the null hypotheses of constant QE (but at a different level than 0) is added if the option cons_test is specified. If group is specified, then this matrix is reported when treatment=FALSE, or treatment=TRUE and decomposition=TRUE.

resTE

a matrix with 6 columns. The columns contain the point estimates, standard errors, pointwise lower end of confidence band, pointwise upper end of confidence band, uniform lower end of confidence band, and uniform upper end of confidence band for the total quantile effect at the quantile indexes specified with quantiles. This matrix is reported when group is specified, treatment=TRUE and decomposition=TRUE

testTE

a matrix with 2 columns including the p-values based on the KS and CMS statistics for several functional hypotheses on the total effect. The first row tests the null-hypothesis of correct specification of the conditional model. The second row tests the null hypothesis that the change in the distribution of the covariates has no effect. The following rows tests the null hypotheses of constant QE, positive QE, and negative QE. An additional row testing the null hypotheses of constant QE (but at a different level than 0) is added if the option cons_test is specified. This matrix is reported when group is specified, treatment=TRUE and decomposition=TRUE.

Details

The populations to construct the observed and counterfactual distributions can be specified in two alternative ways. If the option group is specified and treatment=FALSE, then the observed distribution is estimated from the conditional and covariate distributions of group=0, and the counterfactual distribution is estimated from the conditional distribution of group=0 and the covariate distribution of group=1. If group is specified and treatment=TRUE, then the observed distribution is estimated from the conditional and covariate distributions of group=1, and the counterfactual distribution is estimated from the conditional distribution of group=0 and the covariate distribution of group=1. If group is specified, treatment=TRUE and decomposition=TRUE, then all the previous observed and counterfactual distributions are estimated. Alternatively, the option counterfactual_var can be specified. In this case, the variables specified in the right hand side of formula contain the covariate values used to estimate the observed distribution and the variables specified in counterfactual_var contain the covariate values to estimate the counterfactual distribution. Note that counterfactual_var must contain exactly the same number of variables as in the right hand side of formula and that the order matters. In addition, if counterfactual_var is a deterministic transformation of the covariates in the reference population, then transformation should be set to TRUE.

method:

qr is the default, selects the method based on the linear quantile regression estimator of Koenker and Bassett (1978).

loc selects the linear location shift method.

locsca selects the linear location-scale shift method. The logarithm of the variance of the residuals is assumed to be a linear function of the variables given in scale_variable.

cqr selects the method based on the censored linear quantile regression estimator of Chernozhukov and Hong (2002). The variable with the censoring values for each observation must be specified in censoring. By default, this estimator is a three-steps estimator. The number of steps can be increased by the option nsteps.

cox selects the methob based on the proportional hazard or duration regression estimator of Cox (1972).

logit selects the method based on the distribution regression estimator of Chernozhukov, Fernandez-Val and Melly (2013) with logit link function.

probit selects the method based on the distribution regression estimator of Chernozhukov, Fernandez-Val and Melly (2013) with probit link function.

lpm selects the method based on the distribution regression estimator of Chernozhukov, Fernandez-Val and Melly (2013) with linear link function.

We refer the user to Chen, Chernozhukov, Fernandez-Val and Melly (2016) for a more detailed description of the methods.

References

Chen, M., Chernozhukov, V., I. Fernandez-Val, and B. Melly (2016). Counterfactual Analysis in R: A Vignette.

Chernozhukov, V., I. Fernandez-Val, and B. Melly (2013). Inference on Counterfactual Distributions. Econometrica 81(6), 2205-2268.

Chernozhukov, V., and H. Hong (2002). Three-step Censored Quantile Regression and Extramarital Affairs.Journal of the American Statistical Association, 97, 872-881.

Cox, D. R. (1972). Regression Models and Life Tables. Journal of the Royal Statistical Society, Ser. B, 34, 187-220.

Koenker, R., and G. Bassett (1978). Regression Quantiles. Econometrica, 46(1), 33-50.

Examples

Run this code

# NOT RUN {
#Counterfactual distribution of X constructed by transformation of reference distribution
# }
# NOT RUN {
data(engel)
attach(engel)
counter_income <- mean(income)+0.75*(income-mean(income))
rqres <- counterfactual(foodexp~income, counterfactual_var=counter_income,
nreg=100, transformation=TRUE, sepcore = TRUE, ncore=2)
# }
# NOT RUN {
# Wage decomposition: counterfactual and reference populations correspond to different groups
data(nlsw88)
attach(nlsw88)
lwage <- log(wage)

# method: logit
logitres<-counterfactual(lwage~tenure+ttl_exp+grade, group=union, treatment=TRUE, 
decomposition=TRUE, method="logit", noboot=TRUE, sepcore = TRUE,ncore=2)
# }

Run the code above in your browser using DataLab