In matchit()
, setting method = "nearest"
performs greedy nearest
neighbor matching. A distance is computed between each treated unit and each
control unit, and, one by one, each treated unit is assigned a control unit
as a match. The matching is "greedy" in the sense that there is no action
taken to optimize an overall criterion; each match is selected without
considering the other matches that may occur subsequently.
This page details the allowable arguments with method = "nearest"
.
See matchit()
for an explanation of what each argument means in a general
context and how it can be specified.
Below is how matchit()
is used for nearest neighbor matching:
matchit(formula,
data = NULL,
method = "nearest",
distance = "glm",
link = "logit",
distance.options = list(),
estimand = "ATT",
exact = NULL,
mahvars = NULL,
antiexact = NULL,
discard = "none",
reestimate = FALSE,
s.weights = NULL,
replace = TRUE,
m.order = NULL,
caliper = NULL,
ratio = 1,
min.controls = NULL,
max.controls = NULL,
verbose = FALSE,
...)
a two-sided formula object containing the treatment and covariates to be used in creating the distance measure used in the matching.
a data frame containing the variables named in formula
.
If not found in data
, the variables will be sought in the
environment.
set here to "nearest"
.
the distance measure to be used. See distance
for allowable options. Can be supplied as a distance matrix.
when distance
is specified as a method of estimating
propensity scores, an additional argument controlling the link function used
in estimating the distance measure. See distance
for allowable
options with each option.
a named list containing additional arguments
supplied to the function that estimates the distance measure as determined
by the argument to distance
.
a string containing the desired estimand. Allowable options
include "ATT"
and "ATC"
. See Details.
for which variables exact matching should take place.
for which variables Mahalanobis distance matching should take
place when distance
corresponds to a propensity score (e.g., for
caliper matching or to discard units for common support). If specified, the
distance measure will not be used in matching.
for which variables ant-exact matching should take place.
a string containing a method for discarding units outside a
region of common support. Only allowed when distance
corresponds to a
propensity score.
if discard
is not "none"
, whether to
re-estimate the propensity score in the remaining sample prior to matching.
the variable containing sampling weights to be incorporated into propensity score models and balance statistics.
whether matching should be done with replacement.
the order that the matching takes place. Allowable options
include "largest"
, where matching takes place in descending order of
distance measures; "smallest"
, where matching takes place in ascending
order of distance measures; "closest"
, where matching takes place in
order of the distance between units; "random"
, where matching takes place
in a random order; and "data"
where matching takes place based on the
order of units in the data. When m.order = "random"
, results may differ
across different runs of the same code unless a seed is set and specified
with set.seed()
. The default of NULL
corresponds to "largest"
when a
propensity score is estimated or supplied as a vector and "data"
otherwise.
the width(s) of the caliper(s) used for caliper matching. See Details and Examples.
logical
; when calipers are specified, whether they
are in standard deviation units (TRUE
) or raw units (FALSE
).
how many control units should be matched to each treated unit for k:1 matching. For variable ratio matching, see section "Variable Ratio Matching" in Details below.
for variable ratio matching, the minimum and maximum number of controls units to be matched to each treated unit. See section "Variable Ratio Matching" in Details below.
logical
; whether information about the matching
process should be printed to the console. When TRUE
, a progress bar
implemented using RcppProgress will be displayed.
additional arguments that control the matching specification:
reuse.max
numeric
; the maximum number of
times each control can be used as a match. Setting reuse.max = 1
corresponds to matching without replacement (i.e., replace = FALSE
),
and setting reuse.max = Inf
corresponds to traditional matching with
replacement (i.e., replace = TRUE
) with no limit on the number of
times each control unit can be matched. Other values restrict the number of
times each control can be matched when matching with replacement.
replace
is ignored when reuse.max
is specified.
unit.id
one or more variables containing a unit ID for each
observation, i.e., in case multiple observations correspond to the same
unit. Once a control observation has been matched, no other observation with
the same unit ID can be used as matches. This ensures each control unit is
used only once even if it has multiple observations associated with it.
Omitting this argument is the same as giving each observation a unique ID.
Ignored when replace = TRUE
.
All outputs described in matchit()
are returned with
method = "nearest"
. When replace = TRUE
, the subclass
component is omitted. include.obj
is ignored.
Mahalanobis distance matching can be done one of two ways:
If no propensity score needs to be estimated, distance
should be
set to "mahalanobis"
, and Mahalanobis distance matching will occur
using all the variables in formula
. Arguments to discard
and
mahvars
will be ignored, and a caliper can only be placed on named
variables. For example, to perform simple Mahalanobis distance matching, the
following could be run:
matchit(treat ~ X1 + X2, method = "nearest",
distance = "mahalanobis")
With this code, the Mahalanobis distance is computed using X1
and
X2
, and matching occurs on this distance. The distance
component of the matchit()
output will be empty.If a propensity score needs to be estimated for any reason, e.g., for
common support with discard
or for creating a caliper,
distance
should be whatever method is used to estimate the propensity
score or a vector of distance measures. Use mahvars
to specify the
variables used to create the Mahalanobis distance. For example, to perform
Mahalanobis within a propensity score caliper, the following could be run:
matchit(treat ~ X1 + X2 + X3, method = "nearest",
distance = "glm", caliper = .25,
mahvars = ~ X1 + X2)
With this code, X1
, X2
, and X3
are used to estimate the
propensity score (using the "glm"
method, which by default is
logistic regression), which is used to create a matching caliper. The actual
matching occurs on the Mahalanobis distance computed only using X1
and X2
, which are supplied to mahvars
. Units whose propensity
score difference is larger than the caliper will not be paired, and some
treated units may therefore not receive a match. The estimated propensity
scores will be included in the distance
component of the
matchit()
output. See Examples.
The estimand
argument controls whether control units are selected to be
matched with treated units (estimand = "ATT"
) or treated units are
selected to be matched with control units (estimand = "ATC"
). The
"focal" group (e.g., the treated units for the ATT) is typically made to be
the smaller treatment group, and a warning will be thrown if it is not set
that way unless replace = TRUE
. Setting estimand = "ATC"
is
equivalent to swapping all treated and control labels for the treatment
variable. When estimand = "ATC"
, the default m.order
is
"smallest"
, and the match.matrix
component of the output will
have the names of the control units as the rownames and be filled with the
names of the matched treated units (opposite to when estimand = "ATT"
). Note that the argument supplied to estimand
doesn't
necessarily correspond to the estimand actually targeted; it is merely a
switch to trigger which treatment group is considered "focal".
matchit()
can perform variable
ratio "extremal" matching as described by Ming and Rosenbaum (2000). This
method tends to result in better balance than fixed ratio matching at the
expense of some precision. When ratio > 1
, rather than requiring all
treated units to receive ratio
matches, each treated unit is assigned
a value that corresponds to the number of control units they will be matched
to. These values are controlled by the arguments min.controls
and
max.controls
, which correspond to \(\alpha\) and \(\beta\),
respectively, in Ming and Rosenbaum (2000), and trigger variable ratio
matching to occur. Some treated units will receive min.controls
matches and others will receive max.controls
matches (and one unit
may have an intermediate number of matches); how many units are assigned
each number of matches is determined by the algorithm described in Ming and
Rosenbaum (2000, p119). ratio
controls how many total control units
will be matched: n1 * ratio
control units will be matched, where
n1
is the number of treated units, yielding the same total number of
matched controls as fixed ratio matching does.
Variable ratio matching cannot be used with Mahalanobis distance matching or
when distance
is supplied as a matrix. The calculations of the
numbers of control units each treated unit will be matched to occurs without
consideration of caliper
or discard
. ratio
does not
have to be an integer but must be greater than 1 and less than n0/n1
,
where n0
and n1
are the number of control and treated units,
respectively. Setting ratio = n0/n1
performs a crude form of full
matching where all control units are matched. If min.controls
is not
specified, it is set to 1 by default. min.controls
must be less than
ratio
, and max.controls
must be greater than ratio
. See
Examples below for an example of their use.
m.order = "closest"
As of version 4.6.0, m.order
can be set to "closest"
, which works regardless of how the distance measure is specified. This matches in order of the distance between units. The closest pair of units across all potential pairs of units will be matched first; the second closest pair of all potential pairs will be matched second, etc. This ensures that the best possible matches are given priority, and in that sense performs similarly to m.order = "smallest"
.
In a manuscript, you don't need to cite another package when
using method = "nearest"
because the matching is performed completely
within MatchIt. For example, a sentence might read:
Nearest neighbor matching was performed using the MatchIt package (Ho, Imai, King, & Stuart, 2011) in R.
matchit()
for a detailed explanation of the inputs and outputs of
a call to matchit()
.
method_optimal()
for optimal pair matching, which is similar to
nearest neighbor matching except that an overall distance criterion is
minimized.
data("lalonde")
# 1:1 greedy NN matching on the PS
m.out1 <- matchit(treat ~ age + educ + race + nodegree +
married + re74 + re75, data = lalonde,
method = "nearest")
m.out1
summary(m.out1)
# 3:1 NN Mahalanobis distance matching with
# replacement within a PS caliper
m.out2 <- matchit(treat ~ age + educ + race + nodegree +
married + re74 + re75, data = lalonde,
method = "nearest", replace = TRUE,
mahvars = ~ age + educ + re74 + re75,
ratio = 3, caliper = .02)
m.out2
summary(m.out2, un = FALSE)
# 1:1 NN Mahalanobis distance matching within calipers
# on re74 and re75 and exact matching on married and race
m.out3 <- matchit(treat ~ age + educ + re74 + re75, data = lalonde,
method = "nearest", distance = "mahalanobis",
exact = ~ married + race,
caliper = c(re74 = .2, re75 = .15))
m.out3
summary(m.out3, un = FALSE)
# 2:1 variable ratio NN matching on the PS
m.out4 <- matchit(treat ~ age + educ + race + nodegree +
married + re74 + re75, data = lalonde,
method = "nearest", ratio = 2,
min.controls = 1, max.controls = 12)
m.out4
summary(m.out4, un = FALSE)
# Some units received 1 match and some received 12
table(table(m.out4$subclass[m.out4$treat == 0]))
Run the code above in your browser using DataLab