method_super: Propensity Score Weighting Using SuperLearner

Description

This page explains the details of estimating weights from SuperLearner-based propensity scores by setting method = "super" in the call to weightit or weightitMSM. This method can be used with binary, multinomial, and continuous treatments.

In general, this method relies on estimating propensity scores using the SuperLearner algorithm for stacking predictions and then converting those propensity scores into weights using a formula that depends on the desired estimand. For binary and multinomial treatments, one or more binary classification algorithms are used to estimate the propensity scores as the predicted probability of being in each treatment given the covariates. For continuous treatments, a regression algorithm is used to estimate generalized propensity scores as the conditional density of treatment given the covariates.

Binary Treatments

For binary treatments, this method estimates the propensity scores using SuperLearner in the SuperLearner package. The following estimands are allowed: ATE, ATT, ATC, ATO, and ATM. The weights for the ATE, ATT, and ATC are computed from the estimated propensity scores using the standard formulas, the weights for the ATO are computed as in Li & Li (2018), and the weights for the ATM (i.e., average treatment effect in the equivalent sample "pair-matched" with calipers) are computed as in Yoshida et al (2017). When include.obj = TRUE, the returned object is the SuperLearner fit.

Multinomial Treatments

For multinomial treatments, the propensity scores are estimated using several calls to SuperLearner, one for each treatment group, and the treatment probabilities are normalized to sum to 1. The following estimands are allowed: ATE, ATT, ATO, and ATM. The weights for each estimand are computed using the standard formulas or those mentioned above. When include.obj = TRUE, the returned object is the list of fit object from the SuperLearner calls.

Continuous Treatments

For continuous treatments, the generalized propensity score is estimated using SuperLearner. In addition, kernel density estimation can be used instead of assuming a normal density for the numerator and denominator of the generalized propensity score by setting use.kernel = TRUE. Other arguments to density can be specified to refine the density estimation parameters. plot = TRUE can be specified to plot the density for the numerator and denominator, which can be helpful in diagnosing extreme weights. When include.obj = TRUE, the returned object is the SuperLearner fit from denominator model.

Longitudinal Treatments

For longitudinal treatments, the weights are the product of the weights estimated at each time point.

Sampling Weights

Sampling weights are supported through s.weights in all scenarios.

Missing Data

Missing data is not compatible with SuperLearner, so a few extra things happen when NAs are present in the covariates. First, for each variable with missingness, a new missingness indicator variable is created which takes the value 1 if the original covariate is NA and 0 otherwise. The missingness indicators are added to the model formula as main effects. The missing values in the covariates are then replaced with 0s (this value is arbitrary and does not affect estimation). The weight estimation then proceeds with this new formula and set of covariates. The covariates output in the resulting weightit object will be the original covariates with the NAs.

Additional Arguments

An argument to SL.library must be supplied. To see a list of available entries, use listWrappers.

All arguments to SuperLearner can be passed through weightit or weightitMSM, with the following exceptions:

method in SuperLearner is replaced with the argument SL.method in weightit.

obsWeights is ignored because sampling weights are passed using s.weights.

The following additional arguments can be specified:

use.kernel: If TRUE, uses kernel density estimation through density to estimate the numerator and denominator densities for the weights with continuous treatments. If FALSE, assumes a normal distribution.
bw, adjust, kernel, n: If use.kernel = TRUE with continuous treatments, the arguments to density. The defaults are the same as those in density except that n is 10 times the number of units in the sample.
plot: If use.kernel = TRUE with continuous treatments, whether to plot the estimated density.

References

Pirracchio, R., Petersen, M. L., & van der Laan, M. (2015). Improving Propensity Score Estimators<U+2019> Robustness to Model Misspecification Using Super Learner. American Journal of Epidemiology, 181(2), 108<U+2013>119. 10.1093/aje/kwu253

Examples

Run this code

# NOT RUN {
library("cobalt")
data("lalonde", package = "cobalt")

#Balancing covariates between treatment groups (binary)
(W1 <- weightit(treat ~ age + educ + married +
                  nodegree + re74, data = lalonde,
                method = "super", estimand = "ATT",
                SL.library = c("SL.glm", "SL.gam",
                               "SL.knn")))
summary(W1)
bal.tab(W1)

#Balancing covariates with respect to race (multinomial)
(W2 <- weightit(race ~ age + educ + married +
                  nodegree + re74, data = lalonde,
                method = "super", estimand = "ATE",
                SL.library = c("SL.glm", "SL.gam",
                               "SL.knn")))
summary(W2)
bal.tab(W2)

#Balancing covariates with respect to re75 (continuous)
(W3 <- weightit(re75 ~ age + educ + married +
                  nodegree + re74, data = lalonde,
                method = "super", use.kernel = TRUE,
                SL.library = c("SL.glm", "SL.gam",
                               "SL.ridge")))
summary(W3)
bal.tab(W3)
# }

Run the code above in your browser using DataLab