matchit: Matchit: Matching Software for Causal Inference

Description

Matchit enables parametric models for causal inference to work better by selecting well-matched subsets of the original treated and control groups. MatchIt implements the suggestions of Ho, Imai, King, and Stuart (2004) for improving parametric statistical models by preprocessing data with nonparametric matching methods. MatchIt implements a wide range of sophisticated matching methods, making it possible to greatly reduce the dependence of causal inferences on hard-to-justify, but commonly made, statistical modeling assumptions. The software also easily fits into existing research practices since, after preprocessing with MatchIt, researchers can use whatever parametric model they would have used without MatchIt, but produce inferences with substantially more robustness and less sensitivity to modeling assumptions. Matched data sets created by MatchIt can be entered easily in Zelig (http://gking.harvard.edu/zelig) for subsequent parametric analyses. Full documentation is available online at http://gking.harvard.edu/matchit, and help for specific commands is available through help.matchit.

Usage

matchit <- matchit(formula, data, model="logit", discard=0, reestimate=FALSE, nearest=TRUE,
                 replace=FALSE, m.order=2, ratio=1, caliper=0, calclosest=FALSE,
                 subclass=0, sub.by="treat", mahvars=NULL, exact=FALSE, counter=TRUE, full=FALSE, full.options=list(),...)

Arguments

formula

(required). Takes the form of T ~ X1 + X2, where T is a binary treatment indicator and X1 and X2 are the pre-treatment covariates, and T, X1, and X2 are contained

data

(required). Data frame containing the variables called in the formula. The dataframe should not include variables with the names psclass, psweights, or pscore, as these are expressly reserved in the o

model

Method used to estimate the propensity score. May be "logit" (default), "probit", "nnet", "GAM", or "cart".

discard

Whether to discard units that fall outside some measure of support of the distance score. 0 (default)=keep all units. 1=keep all units with common support. 2=discard only control units outside the support of the distance measure of the treated units.

reestimate

Specifies whether to reestimate the propensity score model after discarding units (default=FALSE).

nearest

Whether to perform nearest-neighbor matching (default=TRUE).

replace

Whether to match with replacement (default=FALSE).

m.order

Order in which to match treated units with control units. 1=optimal (requires ``optmatch" package), 2 (default)=from high to low, 3=from low to high, 4=random order.

ratio

The number of control units to be matched to each treated unit (default=1).

caliper

Standard deviations of the propensity score within which to draw control units (default=0).

calclosest

If caliper!=0, whether to take the nearest available match if no matches are available within caliper (default=FALSE).

subclass

Either a scaler specifying the number of subclasses (default=0) or a vector of probabilities to create quantiles based on sub.by.

sub.by

If subclass!=0, by what criteria to subclassify. "treat" (default) =by the number of treated units, "control"=by the number of control units, "all"=by the total number of units.

mahvars

Variables on which to perform Mahalanobis matching within each caliper (default=NULL). Should be entered as a vector of names of variables in data.

exact

"FALSE" (default)=no exact matching. "TRUE"=exact matching on all variables in formula. A vector of variable names (that are in data to indicate separate variables on which to exact match, in combination with matching on the pr

counter

Whether to display counter indicating the progress of the matching (default=TRUE).

full

Whether to do full matching (default=FALSE). Requires ``optmatch" package.

full.options

Additional options for full matching.

...

Additional arguments to be passed to matchit, depending on the model to be used.

Value

callThe original matchit call.
formulaFormula used to specify the propensity score.
match.matrixn1 by ratio data frame where the rows correspond to treated units and the columns store the names of the control units matched to each treated unit. NA indicates that treated unit was not matched.
in.sampleVector of length n showing whether each unit was eligible for matching due to common support restrictions with discard.
matchedVector of length n showing whether each unit was matched.
psweightsVector of length n giving the weight assigned to each unit in the matching process. Each weight is proportional to the number of times that unit was matched.
psclassSubclass index in an ordinal scale from 1 to the number of subclasses. Unmatched units have subclass=0.
q.cutSubclass cut points.
assign.modelOutput of the assignment model.
dataThe original data set, with psclass, psweights, and pscore (propensity scores) added.
treatThe treatment indicator from data.
covariatesCovariates used in the right-hand side of the assignment model.

Details

The matching is done using the matchit(treat ~ X, ...) command, where treat is the vector of treatment assignments and X are the covariates to be used in the matching. There are a number of matching options, detailed below. The full syntax is

matchit(formula, data=NULL, discard=0, exact=FALSE, replace=FALSE, ratio=1, model="logit",
reestimate=FALSE, nearest=TRUE, m.order=2, caliper=0, calclosest=FALSE, mahvars=NULL,
subclass=0, sub.by="treat", counter=TRUE, full=FALSE, full.options=list(),  ...)

A summary of the results can be seen graphically using plot(matchitobject), or numerically using summary(matchitobject). print(matchitobject) also prints out the output.

References