This function estimates the average treatment effect on the treated of a continuously distributed treatment in repeated cross-sections based on a Difference-in-Differences (DiD) approach using double machine learning to control for time-varying confounders in a data-driven manner. It supports estimation under various machine learning methods and uses k-fold cross-fitting.
didcontDML(
y,
d,
t,
dtreat,
dcontrol,
t0 = 0,
t1 = 1,
controls,
MLmethod = "lasso",
psmethod = 1,
trim = 0.1,
lognorm = FALSE,
bw = NULL,
bwfactor = 0.7,
cluster = NULL,
k = 3
)
A list with the following components:
ATET
: Estimate of the Average Treatment Effect on the Treated.
se
: Standard error of the ATET estimate.
trimmed
: Number of discarded (trimmed) observations.
pval
: P-value.
pscores
: Propensity scores (4 columns): under treatment in period t1, under treatment in period t0, under control in period t1, under control in period t0.
outcomes
: Conditional outcomes (3 columns): in treatment group in period t0, in control group in period t1, in control group in period t0.
Outcome variable. Should not contain missing values.
Treatment variable in the treatment period of interest. Should be continuous and not contain missing values.
Time variable indicating outcome periods. Should not contain missing values.
Value of the treatment under treatment (in the treatment period of interest). This value would be 1 for binary treatments.
Value of the treatment under control (in the treatment period of interest). This value would be 0 for binary treatments.
Value indicating the pre-treatment outcome period. Default is 0.
Value indicating the post-treatment outcome period in which the effect is evaluated. Default is 1.
Covariates and/or previous treatment history to be controlled for. Should not contain missing values.
Machine learning method for estimating nuisance parameters using the SuperLearner
package. Must be one of "lasso"
(default), "randomforest"
, "xgboost"
, "svm"
, "ensemble"
, or "parametric"
.
Method for computing generalized propensity scores. Set to 1 for estimating conditional treatment densities using the treatment as dependent variable, or 2 for using the treatment kernel weights as dependent variable. Default is 1.
Trimming threshold (in percentage) for discarding observations with too much influence within any subgroup defined by the treatment group and time. Default is 0.1.
Logical indicating if log-normal transformation should be applied when estimating conditional treatment densities using the treatment as dependent variable. Default is FALSE.
Bandwidth for kernel density estimation. Default is NULL, implying that the bandwidth is calculated based on the rule-of-thumb.
Factor by which the bandwidth is multiplied. Default is 0.7 (undersmoothing).
Optional clustering variable for calculating standard errors.
Number of folds in k-fold cross-fitting. Default is 3.
This function estimates the Average Treatment Effect on the Treated (ATET) by Difference-in-Differences in repeated cross-sections while controlling for confounders using double machine learning. The function supports different machine learning methods for estimating nuisance parameters and performs k-fold cross-fitting to improve estimation accuracy. The function also handles binary and continuous outcomes, and provides options for trimming and bandwidth adjustments in kernel density estimation.
Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W., Robins, J. (2018): "Double/debiased machine learning for treatment and structural parameters", The Econometrics Journal, 21, C1-C68.
Haddad, M., Huber, M., Medina-Reyes, J., Zhang, L. (2024): "Difference-in-Differences under time-varying continuous treatments based on double machine learning"
if (FALSE) {
# Example with simulated data
n=2000
t=rep(c(0, 1), each=n/2)
x=0.5*rnorm(n)
u=runif(n,0,2)
d=x+u+rnorm(n)
y=(2*d+x)*t+u+rnorm(n)
# true effect is 2
results=didcontDML(y=y, d=d, t=t, dtreat=1, dcontrol=0, controls=x, MLmethod="lasso")
cat("ATET: ", round(results$ATET, 3), ", Standard error: ", round(results$se, 3))
}
Run the code above in your browser using DataLab