This function estimates the average treatment effect on the treated of a continuously distributed treatment in panel data based on a Difference-in-Differences (DiD) approach using double machine learning to control for time-varying confounders in a data-driven manner. It supports estimation under various machine learning methods and uses k-fold cross-fitting.
didcontDMLpanel(
ydiff,
d,
t,
dtreat,
dcontrol,
t1 = 1,
controls,
MLmethod = "lasso",
psmethod = 1,
trim = 0.1,
lognorm = FALSE,
bw = NULL,
bwfactor = 0.7,
cluster = NULL,
k = 3
)
A list with the following components:
ATET
: Estimate of the Average Treatment Effect on the Treated.
se
: Standard error of the ATET estimate.
trimmed
: Number of discarded (trimmed) observations.
pval
: P-value.
pscores
: Propensity scores (2 columns): under treatment, under control.
outcomepred
: Conditional outcome predictions.
Outcome difference between two pre- and post-treatment periods. Should not contain missing values.
Treatment variable in the treatment period of interest. Should be continuous and not contain missing values.
Time variable indicating outcome periods. Should not contain missing values.
Value of the treatment under treatment (in the treatment period of interest). This value would be 1 for binary treatments.
Value of the treatment under control (in the treatment period of interest). This value would be 0 for binary treatments.
Value indicating the post-treatment outcome period in which the effect is evaluated, which is the later of the two periods used to generate the outcome difference in ydiff
. For instance, if the pre-treatment outcome is measured in period 0 and the post-treatment outcome is measured in period 1 to generate ydiff
, then t1
is equal to 1. Default is 1.
Covariates and/or previous treatment history to be controlled for. Should not contain missing values.
Machine learning method for estimating nuisance parameters using the SuperLearner
package. Must be one of "lasso"
(default), "randomforest"
, "xgboost"
, "svm"
, "ensemble"
, or "parametric"
.
Method for computing generalized propensity scores. Set to 1 for estimating conditional treatment densities using the treatment as dependent variable, or 2 for using the treatment kernel weights as dependent variable. Default is 1.
Trimming threshold (in percentage) for discarding observations with too much influence within any subgroup defined by the treatment group and time. Default is 0.1.
Logical indicating if log-normal transformation should be applied when estimating conditional treatment densities using the treatment as dependent variable. Default is FALSE.
Bandwidth for kernel density estimation. Default is NULL, implying that the bandwidth is calculated based on the rule-of-thumb.
Factor by which the bandwidth is multiplied. Default is 0.7 (undersmoothing).
Optional clustering variable for calculating standard errors.
Number of folds in k-fold cross-fitting. Default is 3.
This function estimates the Average Treatment Effect on the Treated (ATET) by Difference-in-Differences in panel data while controlling for confounders using double machine learning. The function supports different machine learning methods for estimating nuisance parameters and performs k-fold cross-fitting to improve estimation accuracy. The function also handles binary and continuous outcomes, and provides options for trimming and bandwidth adjustments in kernel density estimation.
Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W., Robins, J. (2018): "Double/debiased machine learning for treatment and structural parameters", The Econometrics Journal, 21, C1-C68.
Haddad, M., Huber, M., Medina-Reyes, J., Zhang, L. (2024): "Difference-in-Differences under time-varying continuous treatments based on double machine learning"
if (FALSE) {
# Example with simulated data
n=1000
x=0.5*rnorm(n)
u=runif(n,0,2)
d=x+u+rnorm(n)
y0=u+rnorm(n)
y1=2*d+x+u+rnorm(n)
t=rep(1,n)
# true effect is 2
results=didcontDMLpanel(ydiff=y1-y0, d=d, t=t, dtreat=1, dcontrol=0, controls=x, MLmethod="lasso")
cat("ATET: ", round(results$ATET, 3), ", Standard error: ", round(results$se, 3))
}
Run the code above in your browser using DataLab