Trim the original data and propensity estimate according to symmetric propensity score trimming rules.
PStrim_SW(
data,
ps.formula = NULL,
zname = NULL,
ps.estimate = NULL,
svywtname = NULL,
delta = 0,
optimal = FALSE,
out.estimate = NULL,
method = "glm",
ps.control = list()
)
PStrim returns a list of the following values:
data
a data frame of trimmed data.
trim_sum
a table summrizing the number of cases by treatment groups before and after trimming.
ps.estimate
a data frame of propensity estimate after trimming.
delta
an optional output of trimming threshold for symmetric trimming.
lambda
an optional output trimming threshold for optimal trimming with multiple treatment groups.
out.estimate
a data frame of estimated potential outcomes after trimming.
an optional data frame containing the variables required by ps.formula
.
an object of class formula
(or one that can be coerced to that class): a symbolic description of the propensity score model to be fitted. Additional details of model specification are given under "Details". This argument is optional if ps.estimate
is not NULL
.
an optional character specifying the name of the treatment variable in data
. Unless ps.formula
is specified, zname
is required.
an optional matrix or data frame containing estimated (generalized) propensity scores for each observation. Typically, this is an N by J matrix, where N is the number of observations and J is the total number of treatment levels. Preferably, the column name of this matrix should match the name of treatment level, if column name is missing or there is a mismatch, the column names would be assigned according to alphabatic order of the treatment levels. A vector of propensity score estimates is also allowed in ps.estimate
, in which case a binary treatment is implied and the input is regarded as the propensity to receive the last category of treatment by alphabatic order, unless otherwise stated by trtgrp
.
an optional character specifying the name of the survey weight variable in data
.
Default is NULL
. Only required if survey.indicator
is TRUE
. If survey.indicator
is TRUE
and svywtname
is not provided, a default survey weight of 1 will be applied to all samples.
trimming threshold for estimated (generalized) propensity scores. Should be no larger than 1 / number of treatment groups. Default is 0, corresponding to no trimming.
an logical argument indicating if optimal trimming should be used. Default is FALSE
.
an optional matrix or data frame containing estimated potential outcomes
for each observation. Typically, this is an N by J matrix, where N is the number of observations
and J is the total number of treatment levels. Preferably, the column name of this matrix should
match the name of treatment level, if column name is missing or there is a mismatch,
the column names would be assigned according to alphabatic order of the treatment levels, with a
similar mechanism as in ps.estimate
.
a character to specify the method for estimating propensity scores. "glm"
is default, and "gbm"
and "SuperLearner"
are also allowed.
a list to specify additional options when method
is set to "gbm"
or "SuperLearner"
.
A typical form for ps.formula
is treatment ~ terms
where treatment
is the treatment
variable (identical to the variable name used to specify zname
) and terms
is a series of terms
which specifies a linear predictor for treatment
. ps.formula
specifies a
model for estimating the propensity scores, when ps.estimate
is NULL
.
"glm"
is the default method for propensity score estimation. Logistic regression will be used for binary outcomes,
and multinomial logistic regression will be used for outcomes with more than two categories. The alternative method option of "gbm"
serves as an API to call the gbm()
function from the
gbm
package. Additional argument in the gbm()
function can be supplied through the ps.control=list()
argument in SumStat()
. Please refer to the user manual of the "gbm"
package for all the
allowed arguments. Currently, models for binary or multinomial treatment will be automatically chosen based on the number of treatment categories.
"SuperLearner"
is also allowed in the method
argument to call the SuperLearner()
function in SuperLearner
package.
Currently, the SuperLearner method only support binary treatment with the default method set to "SL.glm"
. The estimation approach is default to "method.NNLS"
.
Prediction algorithm and other tuning parameters can also be passed through ps.control=list()
. Please refer to the user manual of the SuperLearner
package for all the allowed specifications.
When comparing two treatments, ps.estimate
can either be a vector or a two-column matrix of estimated
propensity scores. If a vector is supplied, it is assumed to be the propensity scores to receive the treatment, and
the treatment group corresponds to the last group in the alphebatic order, unless otherwise specified by trtgrp
.
When comparing multiple (J>=3) treatments, ps.estimate
needs to be specified as an N by J matrix,
where N indicates the number of observations, and J indicates the total number of treatments.
This matrix specifies the estimated generalized propensity scores to receive each of the J treatments.
The same mechanism applies to out.estimate
, except that the input for out.estimate
must be an N by J matrix, where each row corresponds to the estimated potential outcomes (corresponding to each treatment)
for each observation.
With binary treatments, delta
defines the symmetric propensity score trimming rule following Crump et al. (2009).
With multiple treatments, delta
defines the symmetric multinomial trimming rule introduced in Yoshida et al. (2019).
With binary treatments and when optimal
equals TRUE
, the trimming function implements the optimal
symmetric trimming rule in Crump et al. (2009). The optimal trimming threshold delta
is then returned.
With multiple treatments and optimal
equals TRUE
, the trimming function implements the optimal trimming rule in Yang et al. (2016).
The optimal cutoff lambda
, which defines the acceptable upper bound for the sum of inverse generalized propensity scores, is
returned. See Yang et al. (2016) and Li and Li (2019) for details.
The argument zname
is required when ps.estimate
is not NULL
.
Crump, R. K., Hotz, V. J., Imbens, G. W., Mitnik, O. A. (2009). Dealing with limited overlap in estimation of average treatment effects. Biometrika, 96(1), 187-199.
Yoshida, K., Solomon, D.H., Haneuse, S., Kim, S.C., Patorno, E., Tedeschi, S.K., Lyu, H., Franklin, J.M., Stürmer, T., Hernández-Díaz, S. and Glynn, R.J. (2019). Multinomial extension of propensity score trimming methods: A simulation study. American Journal of Epidemiology, 188(3), 609-616.
Yang, S., Imbens, G. W., Cui, Z., Faries, D. E., Kadziola, Z. (2016). Propensity score matching and subclassification in observational studies with multi-level treatments. Biometrics, 72(4), 1055-1065.
Li, F., Li, F. (2019). Propensity score weighting for causal inference with multiple treatments. The Annals of Applied Statistics, 13(4), 2389-2415.
# Define the propensity score model
ps.formula <- trt ~ cov1 + cov2 + cov3 + cov4 + cov5 + cov6
## Example 1: Apply symmetric trimming with delta = 0.05
trim_result <- PStrim_SW(data = psdata_bin_prospective_fp,
ps.formula = ps.formula,
svywtname = "survey_weight",
delta = 0.05)
# Display the trimming summary and view the trimmed data
print(trim_result)
## Example 2: Apply optimal trimming (delta is ignored when optimal = TRUE)
trim_result_opt <- PStrim_SW(data = psdata_bin_prospective_fp,
ps.formula = ps.formula,
svywtname = "survey_weight",
optimal = TRUE)
# Display the optimal trimming summary including the computed lambda
print(trim_result_opt)
Run the code above in your browser using DataLab