Implements the estimation and inference methods for counterfactual analysis described in Chernozhukov, Fernandez-Val and Melly (2013). Counterfactual
reports point estimates, pointwise confidence bands, and simultaneous confidence bands for function-valued quantile effects (QE). It also reports p-values for functional hypotheses such as no effect, constant effect and stochastic dominance. The uniform confidence bands and p-values are obtained by inverting Kolmogorov-Smirnov (KS) and Cramer-von-Misses-Smirnov (CMS) statistics. The distribution of these statistics is approximated by empirical or weighted bootstrap. We recommend the use of weighted bootstrap when the covariates X include discrete components with small cell sizes.
counterfactual(formula, data, weights, na.action = na.exclude,
group, treatment =FALSE, decomposition = FALSE, counterfactual_var,
transformation = FALSE, quantiles = c(1:9)/10,
method = "qr", trimming = 0.005, nreg = 100,
scale_variable, counterfactual_scale_variable, censoring = 0,
right = FALSE, nsteps = 3, firstc = 0.1, secondc = 0.05,
noboot = FALSE, weightedboot = FALSE, seed = 8, robust = FALSE,
reps = 100, alpha = 0.05, first = 0.1, last = 0.9, cons_test = 0,
printdeco = TRUE, sepcore = FALSE, ncore = 1)
a formula object, with the response Y on the left of a ~ operator, and the covariate terms X, separated by + operators, on the right.
a data.frame in which to interpret the variables named in the formula, or in the weights argument. If this is missing, then the variables in the formula should be on the search list.
vector of observation weights.
a function to filter missing data.
The default (with na.fail
) is to create an error if any missing values are found.
A possible alternative is na.omit
, which deletes observations that contain one or more missing values.
quantile indexes of interest for the QE. It should be a vector of values between 0 and 1 with default c(1:9)/10
.
name of a binary variable defining the reference population (value 0) and counterfactual population (value 1).
logical: if TRUE
, then computes the structure or treatment effect (only useful when group
is specified); if FALSE
, then computes the composition effect.
logical: if TRUE
, then computes the structure effect, composition effect and total effect; if FALSE
, then computes the structure effect (only useful when group
is specified, and treatment=TRUE
).
logical: if TRUE
, then the counterfactual distribution of X is generated by transformation of the distribution of X in the reference population.
selects the values of X in the counterfactual population (only useful when group
is not specified).
selects the model to be used to estimate the conditional distribution. The following methods have been implemented:
qr
(quantile regression, the default), loc
(location shift), locsca
(location scale shift), cqr
(censored quantile regression), cox
(duration regression), logit
(logit distribution regression), probit
(probit distribution regression), and lpm
(linear probability model).
value between 0 and 0.5 specifying the amount of trimming to avoid tail estimation in qr
method; default is 0.005.
sets the number of regressions estimated to approximate the conditional distribution; default is 100.
selects the components of X that affect the scale in the locsca
method.
selects the counterfactual values of the components of X that affect the scale in the locsca
method (only useful when counterfactual_var
is specified).
variable specifying the censoring point for each observations (only useful when method=cqr
).
logical: if TRUE
, then indicates that the variable is right-censored; if TRUE
, then indicates that the variable is left-censored (only useful when method=cqr
).
selects the number of steps performed in the cqr
method; default and minimum is 3 (only useful when method=cqr
).
selects the percentage of observations thrown out during the second step in the cqr
method; default is 0.1 (only useful when method=cqr
).
selects the percentage of observations thrown out during the third and further steps of the cqr
method; default is 0.05 (only useful when method=cqr
).
logical: if TRUE
, then suppresses the bootstrap; if FALSE
, the default, then runs the bootstrap.
logical: if TRUE
, then implements weighted bootstrap with standard exponential weights; if FALSE
, the default, then implements empirical bootstrap (only useful when noboot=FALSE
).
sets the seed for the random number generation (only useful when noboot=FALSE
).
logical: if TRUE
, then uses the bootstrap interquartile range to estimate standard errors in the KS and CMS statistics; if FALSE
, the default, then uses the bootstrap standard deviation to estimate standard errors in the KS and CMS statistics (only useful when noboot=FALSE
).
number of bootstrap replications; default is 100 (only useful when noboot=FALSE
).
a real number between 0 and 1 reflecting the desired significance level for the confidence bands and hypotheses tests (only useful when noboot=FALSE
).
sets the lowest quantile that is used for functional inference; default is 0.1 (only useful when noboot=FALSE
).
sets the highes quantile that is used for functional inference; default is 0.9 (only useful when noboot=FALSE
).
adds tests of the null hypothesis that the QEs = cons_test
at all the specified quantiles (only useful when noboot=FALSE
).
logical: if FALSE
, then suppresses table of results.
logical: if TRUE
, then multiple cores are used for parallel computing.
number of cores used for parallel computing (only useful when sepcore=TRUE
).
Return a list of results
quantile indexes of interest for the QE.
a vector with the estimated structure effects at the quantile indexes specified with quantiles
. This vector is reported when group
is specified and treatment=TRUE
.
a vector with the estimated composition effects at the quantile indexes specified with quantiles
. If group
is specified, then this vector is reported when treatment=FALSE
, or treatment=TRUE
and decomposition=TRUE
.
a vector with the estimated total effects at the quantile indexes specified with quantiles
. This vector is reported when group
is specified, treatment=TRUE
and decomposition=TRUE
.
a matrix with 4 columns. The columns contain the point estimates, standard errors, uniform lower end of confidence band, and uniform upper end of confidence band for the quantiles of Y in the observed distribution estimated using sample quantiles at the quantile indexes specified with quantiles
. If group
is specified, then this matrix is reported when treatment=FALSE
, or treatment=TRUE
and decomposition=TRUE
.
a matrix with 4 columns. The columns contain the point estimates, standard errors, uniform lower end of confidence band, and uniform upper end of confidence band for the quantiles of Y in the observed distribution estimated using the conditional model at the quantile indexes specified with quantiles
. If group
is specified, then this matrix is reported when treatment=FALSE
, or treatment=TRUE
and decomposition=TRUE
.
a matrix with 4 columns. The columns contain the point estimates, standard errors, uniform lower end of confidence band, and uniform upper end of confidence band for the quantiles of Y in the counterfactual distribution estimated using the conditional model at the quantile indexes specified with quantiles
.
a matrix with 4 columns. The columns contain the point estimates, standard errors, uniform lower end of confidence band, and uniform upper end of confidence band for the quantiles of Y in the observed distribution of the population defined by $group=1
$ estimated using sample quantiles at the quantile indexes specified with quantiles
. This matrix is reported when group
is specified and treatment=TRUE
.
a matrix with 4 columns. The columns contain the point estimates, standard errors, uniform lower end of confidence band, and uniform upper end of confidence band for the quantiles of Y in the observed distribution of the population defined by $group=1
$ estimated using the conditional model at the quantile indexes specified with quantiles
. This matrix is reported when group
is specified and treatment=TRUE
.
number of regressions estimated to approximate the conditional distribution.
a matrix with 6 columns. The columns contain the point estimates, standard errors, pointwise lower end of confidence band, pointwise upper end of confidence band, uniform lower end of confidence band, and uniform upper end of confidence band for the structure or treatment quantile effect at the quantile indexes specified with quantiles
. This matrix is reported when group
is specified and treatment=TRUE
.
a matrix with 2 columns including the p-values based on the KS and CMS statistics for several functional hypotheses on the structure or treatment effect. The first row tests the null-hypothesis of correct specification of the conditional model. The second row tests the null
hypothesis that the change in the distribution of the covariates has no effect. The following rows tests the null hypotheses of constant QE, positive QE, and negative QE. An additional row testing the null hypotheses of constant QE (but at a different level than 0) is added if the option cons_test
is specified. This matrix is reported when group
is specified and treatment=TRUE
.
a matrix with 6 columns. The columns contain the point estimates, standard errors, pointwise lower end of confidence band, pointwise upper end of confidence band, uniform lower end of confidence band, and uniform upper end of confidence band for the composition quantile effect at the quantile indexes specified with quantiles
. If group
is specified, then this matrix is reported when treatment=FALSE
, or treatment=TRUE
and decomposition=TRUE
.
a matrix with 2 columns including the p-values based on the KS and CMS statistics for several functional hypotheses on the composition effect. The first row tests the null-hypothesis of correct specification of the conditional model. The second row tests the null
hypothesis that the change in the distribution of the covariates has no effect. The following rows tests the null hypotheses of constant QE, positive QE, and negative QE. An additional row testing the null hypotheses of constant QE (but at a different level than 0) is added if the option cons_test
is specified. If group
is specified, then this matrix is reported when treatment=FALSE
, or treatment=TRUE
and decomposition=TRUE
.
a matrix with 6 columns. The columns contain the point estimates, standard errors, pointwise lower end of confidence band, pointwise upper end of confidence band, uniform lower end of confidence band, and uniform upper end of confidence band for the total quantile effect at the quantile indexes specified with quantiles
. This matrix is reported when group
is specified, treatment=TRUE
and decomposition=TRUE
a matrix with 2 columns including the p-values based on the KS and CMS statistics for several functional hypotheses on the total effect. The first row tests the null-hypothesis of correct specification of the conditional model. The second row tests the null
hypothesis that the change in the distribution of the covariates has no effect. The following rows tests the null hypotheses of constant QE, positive QE, and negative QE. An additional row testing the null hypotheses of constant QE (but at a different level than 0) is added if the option cons_test
is specified. This matrix is reported when group
is specified, treatment=TRUE
and decomposition=TRUE
.
The populations to construct the observed and counterfactual distributions can be specified in two alternative ways. If the option group
is specified and treatment=FALSE
, then the observed distribution is estimated from the conditional and covariate distributions of group=0
, and the counterfactual distribution is estimated from the conditional distribution of group=0
and the covariate distribution of group=1
. If group
is specified and treatment=TRUE
, then the observed distribution is estimated from the conditional and covariate distributions of group=1
, and the counterfactual distribution is estimated from the conditional distribution of group=0
and the covariate distribution of group=1
. If group
is specified, treatment=TRUE
and decomposition=TRUE
, then all the previous observed and counterfactual distributions are estimated. Alternatively, the option counterfactual_var
can be specified. In this case, the variables specified in the right hand side of formula
contain the covariate values used to estimate the observed distribution and the variables specified in counterfactual_var
contain the covariate values to estimate the counterfactual distribution. Note that counterfactual_var
must contain exactly the same number of variables as in the right hand side of formula
and that the order matters. In addition, if counterfactual_var
is a deterministic transformation of the covariates in the reference population, then transformation
should be set to TRUE
.
method
:
qr
is the default, selects the method based on the linear quantile regression estimator of Koenker and Bassett (1978).
loc
selects the linear location shift method.
locsca
selects the linear location-scale shift method. The logarithm of the variance of the residuals is assumed to be a linear function of the variables given in scale_variable
.
cqr
selects the method based on the censored linear quantile regression estimator of Chernozhukov and Hong (2002). The variable with the censoring values for each observation must be specified in censoring
. By default, this estimator is a three-steps estimator. The number of steps can be increased by the option nsteps
.
cox
selects the methob based on the proportional hazard or duration regression estimator of Cox (1972).
logit
selects the method based on the distribution regression estimator of Chernozhukov, Fernandez-Val and Melly (2013) with logit link function.
probit
selects the method based on the distribution regression estimator of Chernozhukov, Fernandez-Val and Melly (2013) with probit link function.
lpm
selects the method based on the distribution regression estimator of Chernozhukov, Fernandez-Val and Melly (2013) with linear link function.
We refer the user to Chen, Chernozhukov, Fernandez-Val and Melly (2016) for a more detailed description of the methods.
Chen, M., Chernozhukov, V., I. Fernandez-Val, and B. Melly (2016). Counterfactual Analysis in R: A Vignette.
Chernozhukov, V., I. Fernandez-Val, and B. Melly (2013). Inference on Counterfactual Distributions. Econometrica 81(6), 2205-2268.
Chernozhukov, V., and H. Hong (2002). Three-step Censored Quantile Regression and Extramarital Affairs.Journal of the American Statistical Association, 97, 872-881.
Cox, D. R. (1972). Regression Models and Life Tables. Journal of the Royal Statistical Society, Ser. B, 34, 187-220.
Koenker, R., and G. Bassett (1978). Regression Quantiles. Econometrica, 46(1), 33-50.
# NOT RUN {
#Counterfactual distribution of X constructed by transformation of reference distribution
# }
# NOT RUN {
data(engel)
attach(engel)
counter_income <- mean(income)+0.75*(income-mean(income))
rqres <- counterfactual(foodexp~income, counterfactual_var=counter_income,
nreg=100, transformation=TRUE, sepcore = TRUE, ncore=2)
# }
# NOT RUN {
# Wage decomposition: counterfactual and reference populations correspond to different groups
data(nlsw88)
attach(nlsw88)
lwage <- log(wage)
# method: logit
logitres<-counterfactual(lwage~tenure+ttl_exp+grade, group=union, treatment=TRUE,
decomposition=TRUE, method="logit", noboot=TRUE, sepcore = TRUE,ncore=2)
# }
Run the code above in your browser using DataLab