ps.cont: Generalized Propensity Score Estimation using GBM

Description

ps.cont calculates generalized propensity scores and corresponding weights using boosted linear regression as implemented in gbm. This function extends ps in twang to continuous treatments. The syntax and output are largely the same. The GBM parameter defaults are those found in Zhu, Coffman, & Ghosh (2015).

Note: ps.cont will phased out when twang adds functionality for continuous treatments. All functionality and more is already present in weightit with method_gbm[method = "gbm"].

Usage

ps.cont(formula, data,
        n.trees = 20000,
        interaction.depth = 4,
        shrinkage = 0.0005,
        bag.fraction = 1,
        print.level = 0,
        verbose = FALSE,
        stop.method,
        sampw = NULL,
        optimize = 1,
        use.kernel = FALSE,
        ...)
# S3 method for ps.cont
summary(object, ...)
# S3 method for ps.cont
plot(x, ...)
# S3 method for ps.cont
boxplot(x, ...)

Arguments

formula

A formula for the propensity score model with the treatment indicator on the left side of the formula and the potential confounding variables on the right side.

data

The dataset in the form of a data frame, which should include treatment assignment as well as the covariates specified in formula.

n.trees

The number of GBM iterations passed on to gbm. The more, the better the final solution will be, but the more time it will take.

interaction.depth

The interaction.depth passed on to gbm.

shrinkage

The shrinkage passed on to gbm.

bag.fraction

The bag.fraction passed on to gbm.

print.level

Currently ignored.

verbose

If TRUE, information will be printed to monitor the the progress of the fitting.

stop.method

A method or methods of measuring and summarizing balance across pretreatment variables. Current options are p.max, p.mean, p.rms, s.max, s.mean, and s.rms. p refers to the Pearson correlation and s refers to the Spearman correlation. These are summarized across the pretreatment variables by the maximum (max), the mean (mean), or the square root of the mean of the squares (rms).

sampw

Optional sampling weights.

optimize

A numeric value, either 0, 1, or 2. If 0, balance will be checked for every tree, and the tree with the best balance will be the one used to generate the final weights. If 1, the default, balance will be checked for a subset of trees, and then optimize will be used to find the tree with the best balance within the tree interval chosen. If 2, optimize will be used to find the tree that yields the best balance. 0 takes the longest but is guaranteed to find the best balance among the trees. 2 is the quickest but will often choose a tree that that suboptimal balance, though not by much. 1 is a compromise between speed and comprehensiveness and is the algorithm implemented in twang.

use.kernel

Whether to use kernel density estimation as implemented in density to estimate the numerator of the weights. If TRUE, density will be used. If FALSE, the default, a normal density will be assumed and will be estimated using dnorm().

object, x

A ps.cont object.

…

For ps.cont, if use.density = TRUE, additional arguments to density, which is used to produce the density for the numerator of the weights. These include bw, adjust, kernel, and n. The default values are the defaults for density, except n, which is 10 times the number of units.

For summary.ps.cont, additional arguments affecting the summary produced.

Value

Returns an object of class ps and ps.cont, a list containing

gbm.obj

The returned gbm object.

treat

The treatment variable.

desc

a list containing balance tables for each method selected in stop.method. Includes a component for the unweighted analysis names “unw”. Each desc component includes a list with the following components:

ess: The effective sample size
n: The number of subjects
max.p.cor: The largest absolute Pearson correlation across the covariates
mean.p.cor: The mean absolute Pearson correlation of the covariates
rmse.p.cor: The root mean squared Pearson correlation across the covariates
max.s.cor: The largest absolute Spearman correlation across the covariates
mean.s.cor: The mean absolute Spearman correlation of the covariates
rmse.s.cor: The root mean squared Spearman correlation across the covariates
bal.tab: a table summarizing the quality of the weights for yielding low treatment-covariate correlations. This table is best extracted using bal.table.
n.trees: The estimated optimal number of gbm iterations to optimize the loss function for the associated stop.methods

a data frame containing the estimated generalized propensity scores. Each column is associated with one of the methods selected in stop.methods.

a data frame containing the propensity score weights. Each column is associated with one of the methods selected in stop.methods. If sampling weights are given then these are incorporated into the weights.

estimand

NULL

datestamp

Records the date of the analysis.

parameters

Saves the ps.cont call.

alerts

NULL

iters

A sequence of iterations used in the GBM fits used by plot.ps.cont.

balance

The balance summary for each tree examined, with a column for each stop.method. If optimize = 0, this will contain balance summaries for all trees. If optimize = 1, this will contain balance summaries for the subset of trees corresponding to iters. If optimize = 2, this will be NULL.

n.trees

Maximum number of trees considered in GBM fit.

data

Data as specified in the data argument.

The NULL entries exist so the output object is similar to that of ps in twang.

Details

ps.cont extends ps in twang to continuous treatments. It estimates weights from a series of trees and then outputs the weights that optimize a user-set criterion. The criterion employed involves the correlation between the treatment and each covariate. In a fully balanced sample, the treatment will have a correlation of 0 with covariates sufficient for removing confounding. Zhu, Coffman, & Ghosh (2015), who were the first to describe GBM for propensity score weighting with continuous treatments, recommend this procedure and provided R code to implement the methods they describe. ps.cont adapts their syntax to make it consistent with that of ps in twang. As in Zhu et al. (2015), when the Pearson correlation is requested, weighted biserial correlations will be computed for binary covariates.

The weights are estimated as the marginal density of the treatment divided by the conditional density of the treatment on the covariates for each unit. For the marginal density, a kernel density estimator can be implemented using the density function. For the conditional density, a Gaussian density is assumed. Note that with treatment with outlying values, extreme weights can be produced, so it is important to examine the weights and trim them if necessary.

It is recommended to use as many trees as possible, though this requires more computation time, especially with use.optimize set to 0. There is little difference between using Pearson and Spearman correlations or between using the raw correlations and the Z-transformed correlations. Typically the only gbm-related options that should be changed are the interaction depth and number of trees.

Missing data is not allowed in the covariates because of the ambiguity in computing correlations with missing values.

summary.ps.cont compresses the information in the desc component of the ps.cont object into a short summary table describing the size of the dataset and the quality of the generalized propensity score weights, in a similar way to summary.ps.

plot.ps.cont and boxplot.ps.cont function almost identically to plot.ps and boxplot.ps. See the help pages there for more information. Note that for plot.ps, only options 1, 2, and 6 are available for the plots argument. When use.optimize = 2, option 1 is not available.

References

Zhu, Y., Coffman, D. L., & Ghosh, D. (2015). A Boosting Algorithm for Estimating Generalized Propensity Scores with Continuous Treatments. Journal of Causal Inference, 3(1). 10.1515/jci-2014-0022

Examples

Run this code

# NOT RUN {
# Examples take a long time
# }
# NOT RUN {
library("cobalt")
data("lalonde", package = "cobalt")

#Balancing covariates with respect to re75
psc.out <- ps.cont(re75 ~ age + educ + married +
                nodegree + race + re74, data = lalonde,
                stop.method = c("p.mean", "p.max"),
                use.optimize = 2)
summary(psc.out)
twang::bal.table(psc.out) #twang's bal.table
# }

Run the code above in your browser using DataLab