This page explains the details of estimating weights from generalized boosted model-based propensity scores by setting method = "gbm"
in the call to weightit
or weightitMSM
. This method can be used with binary, multinomial, and continuous treatments.
In general, this method relies on estimating propensity scores using generalized boosted modeling and then converting those propensity scores into weights using a formula that depends on the desired estimand. The algorithm involves using a balance-based or prediction-based criterion to optimize in choosing the value of a tuning parameter (the number of trees). This method mimics the functionality of functions in the twang package, but has improved performance and more flexible options. See Note for more details.
Binary Treatments
For binary treatments, this method estimates the propensity scores using gbm.fit
and then optimizes balance using col_w_smd
for standardized mean differences and col_w_ks
for Kolmogorov-Smirnov statistics, both from cobalt. The following estimands are allowed: ATE, ATT, ATC, ATO, and ATM. The weights are computed from the estimated propensity scores using get_w_from_ps
, which implements the standard formulas. Weights can also be computed using marginal mean weighting through stratification for the ATE, ATT, and ATC. See get_w_from_ps
for details.
Multinomial Treatments
For multinomial treatments, this method estimates the propensity scores using gbm.fit
with distribution = "multinomial"
and then optimizes balance using col_w_smd
for standardized mean differences and col_w_ks
for Kolmogorov-Smirnov statistics, both from cobalt. The following estimands are allowed: ATE, ATT, ATC, ATO, and ATM. The weights are computed from the estimated propensity scores using get_w_from_ps
, which implements the standard formulas. Weights can also be computed using marginal mean weighting through stratification for the ATE, ATT, and ATC. See get_w_from_ps
for details. The balance that is optimized is that between each non-focal treatment and the focal treatment for the ATT and ATC, between each treatment and the overall unweighted sample for the ATE, and between each treatment and the overall weighted sample for other estimands.
Continuous Treatments
For continuous treatments, this method estimates the generalized propensity score using gbm.fit
and then optimizes balance using col_w_corr
for treatment-covariate correlations from cobalt.
Longitudinal Treatments
For longitudinal treatments, the weights are the product of the weights estimated at each time point.
Sampling Weights
Sampling weights are supported through s.weights
in all scenarios.
Missing Data
In the presence of missing data, the following value(s) for missing
are allowed:
"ind"
(default)First, for each variable with missingness, a new missingness indicator variable is created that takes the value 1 if the original covariate is NA
and 0 otherwise. The missingness indicators are added to the model formula as main effects. The weight estimation then proceeds with this new formula and set of covariates using surrogate splitting as described below. The covariates output in the resulting weightit
object will be the original covariates with the NA
s.
"surr"
Surrogate splitting is used to process NA
s. No missingness indicators are created. Nodes are split using only the non-missing values of each variable. To generate predicted values for each unit, a non-missing variable that operates similarly to the variable with missingness is used as a surrogate. Missing values are ignored when calculating balance statistics to choose the optimal tree.