matchit
is the main command of the package
MatchIt, which enables parametric models for causal inference to
work better by selecting well-matched subsets of the original treated
and control groups. MatchIt implements the suggestions of Ho, Imai,
King, and Stuart (2004) for improving parametric statistical models by
preprocessing data with nonparametric matching methods. MatchIt
implements a wide range of sophisticated matching methods, making it
possible to greatly reduce the dependence of causal inferences on
hard-to-justify, but commonly made, statistical modeling assumptions.
The software also easily fits into existing research practices since,
after preprocessing with MatchIt, researchers can use whatever
parametric model they would have used without MatchIt, but produce
inferences with substantially more robustness and less sensitivity to
modeling assumptions. Matched data sets created by MatchIt can be
entered easily in Zelig (http://gking.harvard.edu/zelig) for
subsequent parametric analyses. Full documentation is available online
at http://gking.harvard.edu/matchit, and help for specific
commands is available through help.matchit
.
matchit(formula, data, method = "nearest", distance = "logit",
distance.options = list(), discard = "none",
reestimate = FALSE, ...)
This argument takes the usual syntax of R formula,
treat ~ x1 + x2
, where treat
is a binary treatment
indicator and x1
and x2
are the pre-treatment
covariates. Both the treatment indicator and pre-treatment covariates
must be contained in the same data frame, which is specified as
data
(see below). All of the usual R syntax for formula
works. For example, x1:x2
represents the first order
interaction term between x1
and x2
, and I(x1^2)
represents the square term of x1
. See help(formula)
for details.
This argument specifies the data frame containing the
variables called in formula
.
This argument specifies a matching method. Currently,
"exact"
(exact matching), "full"
(full matching),
"genetic"
(genetic matching), "nearest"
(nearest
neighbor matching), "optimal"
(optimal matching), and
"subclass"
(subclassification) are available. The default is
"nearest"
. Note that within each of these matching methods,
MatchIt offers a variety of options.
This argument specifies the method used to estimate the
distance measure. The default is logistic regression,
"logit"
. A variety of other methods are available.
This optional argument specifies the optional arguments that are passed to the model for estimating the distance measure. The input to this argument should be a list.
This argument specifies whether to discard units that
fall outside some measure of support of the distance score before
matching, and not allow them to be used at all in the matching
procedure. Note that discarding units may change the quantity of
interest being estimated. The options are: "none"
(default), which discards no units before matching,
"both"
, which discards all units (treated and control) that are
outside the support of the distance measure,
"control"
, which discards only control units outside the
support of the distance measure of the treated units, and
"treat"
, which discards only treated units outside the support
of the distance measure of the control units.
This argument specifies whether the model for
distance measure should be re-estimated after units are
discarded. The input must be a logical value. The default is
FALSE
.
Additional arguments to be passed to a variety of matching methods.
The original matchit
call.
The formula used to specify the model for estimating the distance measure.
The output of the model used to estimate
the distance measure. summary(m.out$model)
will give the
summary of the model where m.out
is the output object from
matchit
.
An \(n_1\) by ratio
matrix
where the row names, which can be obtained through
row.names(match.matrix)
, represent the names of the
treatment units, which come from the data frame specified in
data
. Each column stores the name(s) of the control unit(s) matched
to the treatment unit of that row. For example, when the
ratio
input for nearest neighbor or optimal matching is
specified as 3, the three columns of
match.matrix
represent the three control units matched to
one treatment unit).
NA
indicates that the treatment unit was not matched.
A vector of length $n$ that displays
whether the units were ineligible for matching due to common
support restrictions. It equals TRUE
if unit \(i\) was
discarded, and it is set to FALSE
otherwise.
A vector of length \(n\) with the estimated distance measure for each unit.
A vector of length \(n\) that provides the
weights assigned to each unit in the matching process. Unmatched
units have weights equal to 0
. Matched treated units have
weight 1
. Each matched control unit has weight proportional
to the number of treatment units to which it was matched, and the sum of
the control weights is equal to the number of uniquely matched
control units.
The subclass index in an ordinal
scale from 1 to the total number of subclasses as specified in
subclass
(or the total number of subclasses from full or
exact matching). Unmatched units have NA
.
The subclass cut-points that classify the distance measure.
The treatment indicator from
data
(the left-hand side of formula
).
The covariates used for estimating the
distance measure (the right-hand side of formula
).
A basic summary table of matched data (e.g., the number of matched units)
The matching is done using the matchit(treat ~ X, ...)
command, where treat
is the vector of treatment assignments and
X
are the covariates to be used in the matching. There are a
number of matching options, detailed below. The full syntax is
matchit(formula, data=NULL, discard=0, exact=FALSE, replace=FALSE,
ratio=1, model="logit", reestimate=FALSE, nearest=TRUE, m.order=2,
caliper=0, calclosest=FALSE, mahvars=NULL, subclass=0, sub.by="treat",
counter=TRUE, full=FALSE, full.options=list(), …)
A summary of the
results can be seen graphically using plot(matchitobject)
, or
numerically using summary(matchitobject)
.
print(matchitobject)
also prints out the output.
Daniel Ho, Kosuke Imai, Gary King, and Elizabeth Stuart (2007). Matching as Nonparametric Preprocessing for Reducing Model Dependence in Parametric Causal Inference. Political Analysis 15(3): 199-236. http://gking.harvard.edu/files/abs/matchp-abs.shtml
Please use help.matchit
to access the matchit reference
manual. The complete document is available online at
http://gking.harvard.edu/matchit.