fullmatch: Optimal full matching

Description

Given two groups, such as a treatment and a control group, and a treatment-by-control discrepancy matrix indicating desirability and permissibility of potential matches, create optimal full matches of members of the groups. Optionally, incorporate restrictions on matched sets' ratios of treatment to control units.

Usage

fullmatch(distance, min.controls = 0, max.controls = Inf, 
omit.fraction = NULL, tol = 0.001, subclass.indices = NULL)

Arguments

distance

A matrix of nonnegative discrepancies, each indicating the permissibility and desirability of matching the unit corresponding to its row (a 'treatment') to the unit corresponding to its column (a 'control'); or, better, a list of such matrices, as produ

min.controls

The minimum ratio of controls to treatments that is to be permitted within a matched set: should be nonnegative and finite. If min.controls is not a whole number, the reciprocal of a whole number, or zero, then it is rounded down

max.controls

The maximum ratio of controls to treatments that is to be permitted within a matched set: should be positive and numeric. If max.controls is not a whole number, the reciprocal of a whole number, or Inf, then it is rounded <

omit.fraction

Optionally, specify what fraction of controls or treated subjects are to be rejected. If omit.fraction is a positive fraction less than one, then fullmatch leaves up to that fraction of the control reservoir unmatched.

tol

Because of internal rounding, fullmatch may solve a slightly different matching problem than the one specified, in which the match generated by fullmatch may not coincide with an optimal solution of the specified problem.

subclass.indices

An old argument included for back-compatibility; no longer needed.

Value

Primarily, a named vector of class c('optmatch', 'factor'). Elements of this vector correspond to members of the treatment and control groups in reference to which the matching problem was posed, and are named accordingly; the names are taken from the row and column names of distance. Each element of the vector is either NA, indicating unavailability of any suitable matches for that element, or the concatenation of: (i) a character abbreviation of the name of the subclass, if matching within subclasses, or the string 'm' if not; (ii) the string .; and (iii) a nonnegative integer or the string NA. In this last place, positive whole numbers indicate placement of the unit into a matched set and NA indicates that all or part of the matching problem given to fullmatch was found to be infeasible. The functions matched, unmatched, and matchfailed distinguish these scenarios. Secondarily, fullmatch returns various data about the matching process and its result, stored as attributes of the named vector which is its primary output. In particular, the exceedances attribute gives upper bounds, not necessarily sharp, for the amount by which the sum of distances between matched units in the result of fullmatch exceeds the least possible sum of distances between matched units in a feasible solution to the matching problem given to fullmatch. (Such a bound is also printed by print.optmatch and summary.optmatch.)

Details

Finite entries in matrix slots of distance indicate permissible matches, with smaller discrepancies indicating more desirable matches. Matrix distance must have row and column names. Consider using mdist to generate the distances. fullmatch tries to guess the order in which units would have been given in a data frame, and to order the factor that it returns accordingly. If the dimnames of distance, or the matrices it lists, are not simply row numbers of the data frame you're working with, then you should compare the names of fullmatch's output to your row names in order to be sure things are in the proper order. You can relieve yourself of these worries by using mdist (or makedist, pscore.dist, or mahal.dist, which [as of version 0.6] are dispatched as needed by mdist) to produce the distances, as it passes the ordering of units to fullmatch, which then uses it to order its outputs. The value of tol can have a substantial effect on computation time; with smaller values, computation takes longer. Not every tolerance can be met, and how small a tolerance is too small varies with the machine and with the details of the problem. If fullmatch can't guarantee that the tolerance is as small as the given value of argument tol, then matching proceeds but a warning is issued.

References

Hansen, B.B. and Klopfer, S.O. (2006), Optimal full matching and related designs via network flows, Journal of Computational and Graphical Statistics, 15, 609--627. Hansen, B.B. (2004), Full Matching in an Observational Study of Coaching for the SAT, Journal of the American Statistical Association, 99, 609--618.

Rosenbaum, P. (1991), A Characterization of Optimal Designs for Observational Studies, Journal of the Royal Statistical Society, Series B, 53, 597--610.

Examples

Run this code

data(nuclearplants)
### Full matching on a Mahalanobis distance
mhd  <- mdist(pr ~ t1 + t2, data = nuclearplants)
( fm1 <- fullmatch(mhd) )
summary(fm1)
### Full matching with restrictions
( fm2 <- fullmatch(mhd, min=.5, max=4) )
summary(fm2)
### Full matching to half of available controls
( fm3 <- fullmatch(mhd, omit.fraction=.5) )
summary(fm3)
### Full matching within a propensity score caliper.
ppty <- glm(pr~.-(pr+cost), family=binomial(), data=nuclearplants)
### Note that units without counterparts within the
### caliper are automatically dropped.
( fm4 <- fullmatch(mhd+caliper(1, ppty)) )
summary(fm4)

### Propensity balance assessment. Requires RItools package.
library(RItools) ; summary(fm4,ppty)

### Creating a data frame with the matched sets attached.
### mdist(), caliper() and the like cooperate with fullmatch()
### to make sure observations are in the proper order:
all.equal(names(fm4), row.names(nuclearplants))
### So our data frame including the matched sets is just
cbind(nuclearplants, matches=fm4)

### In contrast, if your matching distance is an ordinary matrix
### (as earlier versions of optmatch required), you'll
### have to align it by observation name with your data set. 
cbind(nuclearplants, matches = fm4[row.names(nuclearplants)])

Run the code above in your browser using DataLab