Learn R Programming

grf (version 2.3.2)

generate_causal_data: Generate causal forest data

Description

The following DGPs are available for benchmarking purposes:

  • "simple": tau = max(X1, 0), e = 0.4 + 0.2 * 1(X1 > 0).

  • "aw1": equation (27) of https://arxiv.org/pdf/1510.04342.pdf

  • "aw2": equation (28) of https://arxiv.org/pdf/1510.04342.pdf

  • "aw3": confounding is from "aw1" and tau is from "aw2"

  • "aw3reverse": Same as aw3, but HTEs anticorrelated with baseline

  • "ai1": "Setup 1" from section 6 of https://arxiv.org/pdf/1504.01132.pdf

  • "ai2": "Setup 2" from section 6 of https://arxiv.org/pdf/1504.01132.pdf

  • "kunzel": "Simulation 1" from A.1 in https://arxiv.org/pdf/1706.03461.pdf

  • "nw1": "Setup A" from Section 4 of https://arxiv.org/pdf/1712.04912.pdf

  • "nw2": "Setup B" from Section 4 of https://arxiv.org/pdf/1712.04912.pdf

  • "nw3": "Setup C" from Section 4 of https://arxiv.org/pdf/1712.04912.pdf

  • "nw4": "Setup D" from Section 4 of https://arxiv.org/pdf/1712.04912.pdf

Usage

generate_causal_data(
  n,
  p,
  sigma.m = 1,
  sigma.tau = 0.1,
  sigma.noise = 1,
  dgp = c("simple", "aw1", "aw2", "aw3", "aw3reverse", "ai1", "ai2", "kunzel", "nw1",
    "nw2", "nw3", "nw4")
)

Value

A list consisting of: X, Y, W, tau, m, e, dgp.

Arguments

n

The number of observations.

p

The number of covariates (note: the minimum varies by DGP).

sigma.m

The standard deviation of the unconditional mean of Y. Default is 1.

sigma.tau

The standard deviation of the treatment effect. Default is 0.1.

sigma.noise

The conditional variance of Y. Default is 1.

dgp

The kind of dgp. Default is "simple".

Details

Each DGP is parameterized by X: observables, m: conditional mean of Y, tau: treatment effect, e: propensity scores, V: conditional variance of Y.

The following rescaled data is returned m = m / sd(m) * sigma.m, tau = tau / sd(tau) * sigma.tau, V = V / mean(V) * sigma.noise^2, W = rbinom(e), Y = m + (W - e) * tau + sqrt(V) + rnorm(n).

Examples

Run this code
# \donttest{
# Generate simple benchmark data
data <- generate_causal_data(100, 5, dgp = "simple")
# Generate data from Wager and Athey (2018)
data <- generate_causal_data(100, 5, dgp = "aw1")
data2 <- generate_causal_data(100, 5, dgp = "aw2")
# }

Run the code above in your browser using DataLab