weightit()
allows for the easy generation of balancing weights
using a variety of available methods for binary, continuous, and
multi-category treatments. Many of these methods exist in other packages,
which weightit()
calls; these packages must be installed to use the desired
method.
weightit(
formula,
data = NULL,
method = "glm",
estimand = "ATE",
stabilize = FALSE,
focal = NULL,
by = NULL,
s.weights = NULL,
ps = NULL,
moments = NULL,
int = FALSE,
subclass = NULL,
missing = NULL,
verbose = FALSE,
include.obj = FALSE,
keep.mparts = TRUE,
...
)
A weightit
object with the following elements:
The estimated weights, one for each unit.
The values of the treatment variable.
The covariates used in the fitting. Only includes the raw covariates, which may have been altered in the fitting process.
The estimand requested.
The weight estimation method specified.
The estimated or provided propensity scores. Estimated propensity scores are
returned for binary treatments and only when method
is "glm"
, "gbm"
, "cbps"
, "ipt"
, "super"
, or "bart"
. The propensity score corresponds to the predicted probability of being treated; see section estimand
and focal
in Details for how the treated group is determined.
The provided sampling weights.
The focal treatment level if the ATT or ATC was requested.
A data.frame
containing the by
variable when specified.
When include.obj = TRUE
, the fit object.
Additional information about the fitting. See the individual methods pages for what is included.
When keep.mparts
is TRUE
(the default) and the chosen method is
compatible with M-estimation, the components related to M-estimation for use
in glm_weightit()
are stored in the "Mparts"
attribute. When by
is
specified, keep.mparts
is set to FALSE
.
a formula with a treatment variable on the left hand side and
the covariates to be balanced on the right hand side. See glm()
for more
details. Interactions and functions of covariates are allowed.
an optional data set in the form of a data frame that contains
the variables in formula
.
a string of length 1 containing the name of the method that
will be used to estimate weights. See Details below for allowable options.
The default is "glm"
for propensity score weighting using a generalized
linear model to estimate the propensity score.
the desired estimand. For binary and multi-category
treatments, can be "ATE"
, "ATT"
, "ATC"
, and, for some methods,
"ATO"
, "ATM"
, or "ATOS"
. The default for both is "ATE"
. This
argument is ignored for continuous treatments. See the individual pages for
each method for more information on which estimands are allowed with each
method and what literature to read to interpret these estimands.
whether or not and how to stabilize the weights. If TRUE
,
each unit's weight will be multiplied by a standardization factor, which is
the the unconditional probability (or density) of each unit's observed
treatment value. If a formula, a generalized linear model will be fit with
the included predictors, and the inverse of the corresponding weight will
be used as the standardization factor. Can only be used with continuous
treatments or when estimand = "ATE"
. Default is FALSE
for no
standardization. See also the num.formula
argument at weightitMSM()
.
For continuous treatments, weights are already stabilized, so setting
stabilize = TRUE
will be ignored with a warning (supplying a formula
still works).
when estimand
is set to "ATT"
or "ATC"
, which group to
consider the "treated" or "control" group. This group will not be weighted,
and the other groups will be weighted to resemble the focal group. If
specified, estimand
will automatically be set to "ATT"
(with a warning
if estimand
is not "ATT"
or "ATC"
). See section estimand
and
focal
in Details below.
a string containing the name of the variable in data
for which
weighting is to be done within categories or a one-sided formula with the
stratifying variable on the right-hand side. For example, if by = "gender"
or by = ~gender
, a separate propensity score model or
optimization will occur within each level of the variable "gender"
. Only
one by
variable is allowed; to stratify by multiply variables
simultaneously, create a new variable that is a full cross of those
variables using interaction()
.
A vector of sampling weights or the name of a variable in
data
that contains sampling weights. These can also be matching weights
if weighting is to be used on matched data. See the individual pages for
each method for information on whether sampling weights can be supplied.
A vector of propensity scores or the name of a variable in data
containing propensity scores. If not NULL
, method
is ignored unless it
is a user-supplied function, and the propensity scores will be used to
create weights. formula
must include the treatment variable in data
,
but the listed covariates will play no role in the weight estimation. Using
ps
is similar to calling get_w_from_ps()
directly, but produces a full
weightit
object rather than just producing weights.
numeric
; for some methods, the greatest power of each
covariate to be balanced. For example, if moments = 3
, for each
non-categorical covariate, the covariate, its square, and its cube will be
balanced. This argument is ignored for other methods; to balance powers of
the covariates, appropriate functions must be entered in formula
. See the
individual pages for each method for information on whether they accept
moments
.
logical
; for some methods, whether first-order interactions of
the covariates are to be balanced. This argument is ignored for other
methods; to balance interactions between the variables, appropriate
functions must be entered in formula
. See the individual pages for each
method for information on whether they accept int
.
numeric
; the number of subclasses to use for computing
weights using marginal mean weighting with subclasses (MMWS). If NULL
,
standard inverse probability weights (and their extensions) will be
computed; if a number greater than 1, subclasses will be formed and weights
will be computed based on subclass membership. Attempting to set a
non-NULL
value for methods that don't compute a propensity score will
result in an error; see each method's help page for information on whether
MMWS weights are compatible with the method. See get_w_from_ps()
for
details and references.
character
; how missing data should be handled. The options
and defaults depend on the method
used. Ignored if no missing data is
present. It should be noted that multiple imputation outperforms all
available missingness methods available in weightit()
and should probably
be used instead. Consider the MatchThem package for the use of
weightit()
with multiply imputed data.
logical
; whether to print additional information output by
the fitting function.
logical
; whether to include in the output any fit
objects created in the process of estimating the weights. For example, with
method = "glm"
, the glm
objects containing the propensity score model
will be included. See the individual pages for each method for information
on what object will be included if TRUE
.
logical
; whether to include in the output components
necessary to estimate standard errors that account for estimation of the
weights in glm_weightit()
. Default is TRUE
if such parts are present.
See the individual pages for each method for whether these components are
produced. Set to FALSE
to keep the output object smaller, e.g., if
standard errors will not be computed using glm_weightit()
.
other arguments for functions called by weightit()
that control
aspects of fitting that are not covered by the above arguments. See
Details.
The primary purpose of weightit()
is as a dispatcher to functions
that perform the estimation of balancing weights using the requested
method
. Below are the methods allowed and links to pages containing more
information about them, including additional arguments and outputs (e.g.,
when include.obj = TRUE
), how missing values are treated, which estimands
are allowed, and whether sampling weights are allowed.
"glm" | Propensity score weighting using generalized linear models |
"gbm" | Propensity score weighting using generalized boosted modeling |
"cbps" | Covariate Balancing Propensity Score weighting |
"npcbps" | Non-parametric Covariate Balancing Propensity Score weighting |
"ebal" | Entropy balancing |
"ipt" | Inverse probability tilting |
"optweight" | Optimization-based weighting |
"super" | Propensity score weighting using SuperLearner |
"bart" | Propensity score weighting using Bayesian additive regression trees (BART) |
"energy" | Energy balancing |
method
can also be supplied as a user-defined function; see method_user
for instructions and examples. Setting method = NULL
computes unit weights.
estimand
and focal
For binary and multi-category treatments, the
argument to estimand
determines what distribution the weighted sample
should resemble. When set to "ATE"
, this requests that each group resemble
the full sample. When set to "ATO"
, "ATM"
, or "ATOS"
(for the methods
that allow them), this requests that each group resemble an "overlap" sample.
When set to "ATT"
or "ATC"
, this requests that each group resemble the
treated or control group, respectively (termed the "focal" group). Weights
are set to 1 for the focal group.
How does weightit()
decide which group is the treated and which group is
the control? For binary treatments, several heuristics are used. The first is
by checking whether a valid argument to focal
was supplied containing the
name of the focal group, which is the treated group when estimand = "ATT"
and the control group when estimand = "ATC"
. If focal
is not supplied,
guesses are made using the following criteria, evaluated in order:
If the treatment variable is logical
, TRUE
is considered treated and FALSE
control.
If the treatment is numeric (or a string or factor with values that can be coerced to numeric values), if 0 is one of the values, it is considered the control, and otherwise, the lower value is considered the control (with the other considered treated).
If exactly one of the treatment values is "t"
, "tr"
, "treat"
, "treated"
, or "exposed"
, it is considered the treated (and the other control).
If exactly one of the treatment values is "c"
, "co"
, "ctrl"
, "control"
, or "unexposed"
, it is considered the control (and the other treated).
If the treatment variable is a factor, the first level is considered control and the second treated.
The lowest value after sorting with sort()
is considered control and the other treated.
To be safe, it is best to code your binary treatment variable as 0
for
control and 1
for treated. Otherwise, focal
should be supplied when
requesting the ATT or ATC. For multi-category treatments, focal
is required
when requesting the ATT or ATC; none of the heuristics above are used.
weightit()
, please cite both the
WeightIt package (using citation("WeightIt")
) and the paper(s) in the
references section of the method used.
weightitMSM()
for estimating weights with sequential (i.e.,
longitudinal) treatments for use in estimating marginal structural models
(MSMs).
weightit.fit()
, which is a lower-level dispatcher function that accepts a
matrix of covariates and a vector of treatment statuses rather than a formula
and data frame and performs minimal argument checking and processing. It may
be useful for speeding up simulation studies for which the correct arguments
are known. In general, weightit()
should be used.
summary.weightit()
for summarizing the weights
library("cobalt")
data("lalonde", package = "cobalt")
#Balancing covariates between treatment groups (binary)
(W1 <- weightit(treat ~ age + educ + married +
nodegree + re74, data = lalonde,
method = "glm", estimand = "ATT"))
summary(W1)
bal.tab(W1)
#Balancing covariates with respect to race (multi-category)
(W2 <- weightit(race ~ age + educ + married +
nodegree + re74, data = lalonde,
method = "ebal", estimand = "ATE"))
summary(W2)
bal.tab(W2)
#Balancing covariates with respect to re75 (continuous)
(W3 <- weightit(re75 ~ age + educ + married +
nodegree + re74, data = lalonde,
method = "cbps"))
summary(W3)
bal.tab(W3)
Run the code above in your browser using DataLab