In matchit()
, setting method = "genetic"
performs genetic matching.
Genetic matching is a form of nearest neighbor matching where distances are
computed as the generalized Mahalanobis distance, which is a generalization
of the Mahalanobis distance with a scaling factor for each covariate that
represents the importance of that covariate to the distance. A genetic
algorithm is used to select the scaling factors. The scaling factors are
chosen as those which maximize a criterion related to covariate balance,
which can be chosen, but which by default is the smallest p-value in
covariate balance tests among the covariates. This method relies on and is a
wrapper for Matching::GenMatch()
and Matching::Match()
, which use
rgenoud::genoud()
to perform the optimization using the genetic
algorithm.
This page details the allowable arguments with method = "genetic"
.
See matchit()
for an explanation of what each argument means in a general
context and how it can be specified.
Below is how matchit()
is used for genetic matching:
matchit(formula,
data = NULL,
method = "genetic",
distance = "glm",
link = "logit",
distance.options = list(),
estimand = "ATT",
exact = NULL,
mahvars = NULL,
antiexact = NULL,
discard = "none",
reestimate = FALSE,
s.weights = NULL,
replace = FALSE,
m.order = NULL,
caliper = NULL,
ratio = 1,
verbose = FALSE,
...)
a two-sided formula object containing the treatment and covariates to be used in creating the distance measure used in the matching. This formula will be supplied to the functions that estimate the distance measure and is used to determine the covariates whose balance is to be optimized.
a data frame containing the variables named in formula
.
If not found in data
, the variables will be sought in the
environment.
set here to "genetic"
.
the distance measure to be used. See distance
for allowable options. When set to a method of estimating propensity scores
or a numeric vector of distance values, the distance measure is included
with the covariates in formula
to be supplied to the generalized
Mahalanobis distance matrix unless mahvars
is specified. Otherwise,
only the covariates in formula
are supplied to the generalized
Mahalanobis distance matrix to have their scaling factors chosen.
distance
cannot be supplied as a distance matrix. Supplying
any method of computing a distance matrix (e.g., "mahalanobis"
) has
the same effect of omitting propensity score but does not affect how the
distance between units is computed otherwise.
when distance
is specified as a method of estimating
propensity scores, an additional argument controlling the link function used
in estimating the distance measure. See distance
for allowable
options with each option.
a named list containing additional arguments
supplied to the function that estimates the distance measure as determined
by the argument to distance
.
a string containing the desired estimand. Allowable options
include "ATT"
and "ATC"
. See Details.
for which variables exact matching should take place.
when a distance corresponds to a propensity score (e.g., for
caliper matching or to discard units for common support), which covariates
should be supplied to the generalized Mahalanobis distance matrix for
matching. If unspecified, all variables in formula
will be supplied
to the distance matrix. Use mahvars
to only supply a subset. Even if
mahvars
is specified, balance will be optimized on all covariates in
formula
. See Details.
for which variables ant-exact matching should take place.
Anti-exact matching is processed using the restrict
argument to
Matching::GenMatch()
and Matching::Match()
.
a string containing a method for discarding units outside a
region of common support. Only allowed when distance
corresponds to a
propensity score.
if discard
is not "none"
, whether to
re-estimate the propensity score in the remaining sample prior to matching.
the variable containing sampling weights to be incorporated
into propensity score models and balance statistics. These are also supplied
to GenMatch()
for use in computing the balance t-test p-values in the
process of matching.
whether matching should be done with replacement.
the order that the matching takes place. Allowable options
include "largest"
, where matching takes place in descending order of
distance measures; "smallest"
, where matching takes place in ascending
order of distance measures; "random"
, where matching takes place
in a random order; and "data"
where matching takes place based on the
order of units in the data. When m.order = "random"
, results may differ
across different runs of the same code unless a seed is set and specified
with set.seed()
. The default of NULL
corresponds to "largest"
when a
propensity score is estimated or supplied as a vector and "data"
otherwise.
the width(s) of the caliper(s) used for caliper matching. See Details and Examples.
logical
; when calipers are specified, whether they
are in standard deviation units (TRUE
) or raw units (FALSE
).
how many control units should be matched to each treated unit for k:1 matching. Should be a single integer value.
logical
; whether information about the matching
process should be printed to the console. When TRUE
, output from
GenMatch()
with print.level = 2
will be displayed. Default is
FALSE
for no printing other than warnings.
additional arguments passed to Matching::GenMatch()
.
Potentially useful options include pop.size
, max.generations
,
and fit.func
. If pop.size
is not specified, a warning from
Matching will be thrown reminding you to change it. Note that the
ties
and CommonSupport
arguments are set to FALSE
and
cannot be changed. If distance.tolerance
is not specified, it is set
to 0, whereas the default in Matching is 1e-5.
All outputs described in matchit()
are returned with
method = "genetic"
. When replace = TRUE
, the subclass
component is omitted. When include.obj = TRUE
in the call to
matchit()
, the output of the call to Matching::GenMatch()
will be
included in the output.
In genetic matching, covariates play three roles: 1) as the variables on
which balance is optimized, 2) as the variables in the generalized
Mahalanobis distance between units, and 3) in estimating the propensity
score. Variables supplied to formula
are always used for role (1), as
the variables on which balance is optimized. When distance
corresponds to a propensity score, the covariates are also used to estimate
the propensity score (unless it is supplied). When mahvars
is
specified, the named variables will form the covariates that go into the
distance matrix. Otherwise, the variables in formula
along with the
propensity score will go into the distance matrix. This leads to three ways
to use distance
and mahvars
to perform the matching:
When distance
corresponds to a propensity score and mahvars
is not specified, the covariates in formula
along with the
propensity score are used to form the generalized Mahalanobis distance
matrix. This is the default and most typical use of method = "genetic"
in matchit()
.
When distance
corresponds to a propensity score and mahvars
is specified, the covariates in mahvars
are used to form the
generalized Mahalanobis distance matrix. The covariates in formula
are used to estimate the propensity score and have their balance optimized
by the genetic algorithm. The propensity score is not included in the
generalized Mahalanobis distance matrix.
When distance
is a method of computing a distance matrix
(e.g.,"mahalanobis"
), no propensity score is estimated, and the
covariates in formula
are used to form the generalized Mahalanobis
distance matrix. Which specific method is supplied has no bearing on how the
distance matrix is computed; it simply serves as a signal to omit estimation
of a propensity score.
When a caliper is specified, any variables mentioned in caliper
,
possibly including the propensity score, will be added to the matching
variables used to form the generalized Mahalanobis distance matrix. This is
because Matching doesn't allow for the separation of caliper
variables and matching variables in genetic matching.
The estimand
argument controls whether control
units are selected to be matched with treated units (estimand = "ATT"
) or treated units are selected to be matched with control units
(estimand = "ATC"
). The "focal" group (e.g., the treated units for
the ATT) is typically made to be the smaller treatment group, and a warning
will be thrown if it is not set that way unless replace = TRUE
.
Setting estimand = "ATC"
is equivalent to swapping all treated and
control labels for the treatment variable. When estimand = "ATC"
, the
default m.order
is "smallest"
, and the match.matrix
component of the output will have the names of the control units as the
rownames and be filled with the names of the matched treated units (opposite
to when estimand = "ATT"
). Note that the argument supplied to
estimand
doesn't necessarily correspond to the estimand actually
targeted; it is merely a switch to trigger which treatment group is
considered "focal". Note that while GenMatch()
and Match()
support the ATE as an estimand, matchit()
only supports the ATT and
ATC for genetic matching.
In a manuscript, be sure to cite the following papers if using
matchit()
with method = "genetic"
:
Diamond, A., & Sekhon, J. S. (2013). Genetic matching for estimating causal effects: A general multivariate matching method for achieving balance in observational studies. Review of Economics and Statistics, 95(3), 932–945. tools:::Rd_expr_doi("10.1162/REST_a_00318")
Sekhon, J. S. (2011). Multivariate and Propensity Score Matching Software with Automated Balance Optimization: The Matching package for R. Journal of Statistical Software, 42(1), 1–52. tools:::Rd_expr_doi("10.18637/jss.v042.i07")
For example, a sentence might read:
Genetic matching was performed using the MatchIt package (Ho, Imai, King, & Stuart, 2011) in R, which calls functions from the Matching package (Diamond & Sekhon, 2013; Sekhon, 2011).
matchit()
for a detailed explanation of the inputs and outputs of
a call to matchit()
.
Matching::GenMatch()
and Matching::Match()
, which do the work.
if (FALSE) { # all(sapply(c("Matching", "rgenoud"), requireNamespace, quietly = TRUE))
data("lalonde")
# 1:1 genetic matching with PS as a covariate
m.out1 <- matchit(treat ~ age + educ + race + nodegree +
married + re74 + re75, data = lalonde,
method = "genetic",
pop.size = 10) #use much larger pop.size
m.out1
summary(m.out1)
# 2:1 genetic matching with replacement without PS
m.out2 <- matchit(treat ~ age + educ + race + nodegree +
married + re74 + re75, data = lalonde,
method = "genetic", replace = TRUE,
ratio = 2, distance = "mahalanobis",
pop.size = 10) #use much larger pop.size
m.out2
summary(m.out2, un = FALSE)
# 1:1 genetic matching on just age, educ, re74, and re75
# within calipers on PS and educ; other variables are
# used to estimate PS
m.out3 <- matchit(treat ~ age + educ + race + nodegree +
married + re74 + re75, data = lalonde,
method = "genetic",
mahvars = ~ age + educ + re74 + re75,
caliper = c(.05, educ = 2),
std.caliper = c(TRUE, FALSE),
pop.size = 10) #use much larger pop.size
m.out3
summary(m.out3, un = FALSE)
}
Run the code above in your browser using DataLab