didcontDMLpanel: Continuous Difference-in-Differences using Double Machine Learning for Panel Data

Description

This function estimates the average treatment effect on the treated of a continuously distributed treatment in panel data based on a Difference-in-Differences (DiD) approach using double machine learning to control for time-varying confounders in a data-driven manner. It supports estimation under various machine learning methods and uses k-fold cross-fitting.

Usage

didcontDMLpanel(
  ydiff,
  d,
  t,
  dtreat,
  dcontrol,
  t1 = 1,
  controls,
  MLmethod = "lasso",
  psmethod = 1,
  trim = 0.1,
  lognorm = FALSE,
  bw = NULL,
  bwfactor = 0.7,
  cluster = NULL,
  k = 3
)

Value

A list with the following components:

ATET: Estimate of the Average Treatment Effect on the Treated.

se: Standard error of the ATET estimate.

trimmed: Number of discarded (trimmed) observations.

pval: P-value.

pscores: Propensity scores (2 columns): under treatment, under control.

outcomepred: Conditional outcome predictions.

Arguments

ydiff: Outcome difference between two pre- and post-treatment periods. Should not contain missing values.
d: Treatment variable in the treatment period of interest. Should be continuous and not contain missing values.
t: Time variable indicating outcome periods. Should not contain missing values.
dtreat: Value of the treatment under treatment (in the treatment period of interest). This value would be 1 for binary treatments.
dcontrol: Value of the treatment under control (in the treatment period of interest). This value would be 0 for binary treatments.
t1: Value indicating the post-treatment outcome period in which the effect is evaluated, which is the later of the two periods used to generate the outcome difference in ydiff. For instance, if the pre-treatment outcome is measured in period 0 and the post-treatment outcome is measured in period 1 to generate ydiff, then t1 is equal to 1. Default is 1.
controls: Covariates and/or previous treatment history to be controlled for. Should not contain missing values.
MLmethod: Machine learning method for estimating nuisance parameters using the SuperLearner package. Must be one of "lasso" (default), "randomforest", "xgboost", "svm", "ensemble", or "parametric".
psmethod: Method for computing generalized propensity scores. Set to 1 for estimating conditional treatment densities using the treatment as dependent variable, or 2 for using the treatment kernel weights as dependent variable. Default is 1.
trim: Trimming threshold (in percentage) for discarding observations with too much influence within any subgroup defined by the treatment group and time. Default is 0.1.
lognorm: Logical indicating if log-normal transformation should be applied when estimating conditional treatment densities using the treatment as dependent variable. Default is FALSE.
bw: Bandwidth for kernel density estimation. Default is NULL, implying that the bandwidth is calculated based on the rule-of-thumb.
bwfactor: Factor by which the bandwidth is multiplied. Default is 0.7 (undersmoothing).
cluster: Optional clustering variable for calculating standard errors.
k: Number of folds in k-fold cross-fitting. Default is 3.

Details

This function estimates the Average Treatment Effect on the Treated (ATET) by Difference-in-Differences in panel data while controlling for confounders using double machine learning. The function supports different machine learning methods for estimating nuisance parameters and performs k-fold cross-fitting to improve estimation accuracy. The function also handles binary and continuous outcomes, and provides options for trimming and bandwidth adjustments in kernel density estimation.

References

Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W., Robins, J. (2018): "Double/debiased machine learning for treatment and structural parameters", The Econometrics Journal, 21, C1-C68.

Haddad, M., Huber, M., Medina-Reyes, J., Zhang, L. (2024): "Difference-in-Differences under time-varying continuous treatments based on double machine learning"

Examples

Run this code

if (FALSE) {
# Example with simulated data
n=1000
x=0.5*rnorm(n)
u=runif(n,0,2)
d=x+u+rnorm(n)
y0=u+rnorm(n)
y1=2*d+x+u+rnorm(n)
t=rep(1,n)
# true effect is 2
results=didcontDMLpanel(ydiff=y1-y0, d=d, t=t, dtreat=1, dcontrol=0, controls=x, MLmethod="lasso")
cat("ATET: ", round(results$ATET, 3), ", Standard error: ", round(results$se, 3))
}

Run the code above in your browser using DataLab