ps.cont
calculates generalized propensity scores and corresponding weights using boosted linear regression as implemented in gbm
. This function extends ps
in twang to continuous treatments. The syntax and output are largely the same. The GBM parameter defaults are those found in Zhu, Coffman, & Ghosh (2015).
Note: ps.cont
will phased out when twang adds functionality for continuous treatments.
ps.cont(formula, data,
n.trees = 20000,
interaction.depth = 4,
shrinkage = 0.0005,
bag.fraction = 1,
print.level = 0,
verbose = FALSE,
stop.method,
sampw = NULL,
optimize = 1,
use.kernel = FALSE,
...)
# S3 method for ps.cont
summary(object, ...)
# S3 method for ps.cont
plot(x, ...)
# S3 method for ps.cont
boxplot(x, ...)
A formula for the propensity score model with the treatment indicator on the left side of the formula and the potential confounding variables on the right side.
The dataset in the form of a data frame, which should include treatment assignment as well as the covariates specified in formula
.
The number of GBM iterations passed on to gbm
. The more, the better the final solution will be, but the more time it will take.
The interaction.depth
passed on to gbm
.
The shrinkage
passed on to gbm
.
The bag.fraction
passed on to gbm
.
Currently ignored.
If TRUE
, information will be printed to monitor the the progress of the fitting.
A method or methods of measuring and summarizing balance across pretreatment variables. Current options are p.max
, p.mean
, p.rms
, s.max
, s.mean
, and s.rms
. p
refers to the Pearson correlation and s
refers to the Spearman correlation. These are summarized across the pretreatment variables by the maximum (max
), the mean (mean
), or the square root of the mean of the squares (rms
).
Optional sampling weights.
A numeric value, either 0
, 1
, or 2
. If 0
, balance will be checked for every tree, and the tree with the best balance will be the one used to generate the final weights. If 1
, the default, balance will be checked for a subset of trees, and then optimize
will be used to find the tree with the best balance within the tree interval chosen. If 2
, optimize
will be used to find the tree that yields the best balance. 0
takes the longest but is guaranteed to find the best balance among the trees. 2
is the quickest but will often choose a tree that that suboptimal balance, though not by much. 1
is a compromise between speed and comprehensiveness and is the algorithm implemented in twang.
A ps.cont
object.
For ps.cont
, if use.density = TRUE
, additional arguments to density
, which is used to produce the density for the numerator of the weights. These include bw
, adjust
, kernel
, and n
. The default values are the defaults for density
, except n
, which is 10 times the number of units.
For summary.ps.cont
, additional arguments affecting the summary produced.
Returns an object of class ps
and ps.cont
, a list containing
The returned gbm
object.
The treatment variable.
a list containing balance tables for each method selected in
stop.method
. Includes a component for the unweighted
analysis names “unw”. Each desc
component includes
a list with the following components:
The effective sample size
The number of subjects
The largest absolute Pearson correlation across the covariates
The mean absolute Pearson correlation of the covariates
The root mean squared Pearson correlation across the covariates
The largest absolute Spearman correlation across the covariates
The mean absolute Spearman correlation of the covariates
The root mean squared Spearman correlation across the covariates
a table summarizing the quality of the weights for yielding low treatment-covariate correlations. This table is best extracted using bal.table
.
The estimated optimal number of gbm
iterations to optimize the loss function for the associated stop.method
s
a data frame containing the estimated generalized propensity scores. Each column is associated with one of the methods selected in stop.methods
.
a data frame containing the propensity score weights. Each column is associated with one of the methods selected in stop.methods
. If sampling weights are given then these are incorporated into the weights.
NULL
Records the date of the analysis.
Saves the ps.cont
call.
NULL
A sequence of iterations used in the GBM fits used by plot.ps.cont
.
The balance summary for each tree examined, with a column for each stop.method. If optimize = 0
, this will contain balance summaries for all trees. If optimize = 1
, this will contain balance summaries for the subset of trees corresponding to iters
. If optimize = 2
, this will be NULL.
Maximum number of trees considered in GBM fit.
Data as specified in the data
argument.
The NULL entries exist so the output object is similar to that of ps in twang.
ps.cont
extends ps
in twang to continuous treatments. It estimates weights from a series of trees and then outputs the weights that optimize a user-set criterion. The criterion employed involves the correlation between the treatment and each covariate. In a fully balanced sample, the treatment will have a correlation of 0 with covariates sufficient for removing confounding. Zhu, Coffman, & Ghosh (2015), who were the first to describe GBM for propensity score weighting with continuous treatments, recommend this procedure and provided R code to implement the methods they describe. ps.cont
adapts their syntax to make it consistent with that of ps
in twang. As in Zhu et al. (2015), when the Pearson correlation is requested, weighted biserial correlations will be computed for binary covariates.
The weights are estimated as the marginal density of the treatment divided by the conditional density of the treatment on the covariates for each unit. For the marginal density, a kernel density estimator can be implemented using the density
function. For the conditional density, a Gaussian density is assumed. Note that with treatment with outlying values, extreme weights can be produced, so it is important to examine the weights and trim them if necessary.
It is recommended to use as many trees as possible, though this requires more computation time, especially with use.optimize
set to 0
. There is little difference between using Pearson and Spearman correlations or between using the raw correlations and the Z-transformed correlations. Typically the only gbm
-related options that should be changed are the interaction depth and number of trees.
Missing data is not allowed in the covariates because of the ambiguity in computing correlations with missing values.
summary.ps.cont
compresses the information in the desc
component of the ps.cont
object into a short summary table describing the size of the dataset and the quality of the generalized propensity score weights, in a similar way to summary.ps
.
plot.ps.cont
and boxplot.ps.cont
function almost identically to plot.ps
and boxplot.ps
. See the help pages there for more information. Note that for plot.ps
, only options 1, 2, and 6 are available for the plots
argument. When use.optimize = 2
, option 1 is not available.
Zhu, Y., Coffman, D. L., & Ghosh, D. (2015). A Boosting Algorithm for Estimating Generalized Propensity Scores with Continuous Treatments. Journal of Causal Inference, 3(1). 10.1515/jci-2014-0022
weightit
for its implementation using weightit
syntax.
ps
and mnps
for GBM with binary and multinomial treatments.
gbm
for the underlying machinery and explanation of the parameters.
# NOT RUN {
# Examples take a long time
# }
# NOT RUN {
library("cobalt")
data("lalonde", package = "cobalt")
#Balancing covariates with respect to re75
psc.out <- ps.cont(re75 ~ age + educ + married +
nodegree + race + re74, data = lalonde,
stop.method = c("p.mean", "p.max"),
use.optimize = 2)
summary(psc.out)
twang::bal.table(psc.out) #twang's bal.table
# }
Run the code above in your browser using DataLab