pairmatch: Optimal 1:1 and 1:k matching

Description

Given a treatment group, a larger control reservoir, and discrepancies between each treatment and control unit, finds a pairing of treatment units to controls that minimizes the sum of discrepancies.

Usage

pairmatch(distance, controls = 1, tol = 0.001, remove.unmatchables=FALSE)

Arguments

distance

A matrix of nonnegative discrepancies, each indicating the permissibility and desirability of matching the unit corresponding to its row (a 'treatment') to the unit corresponding to its column (a 'control'); or a list of such matrices made using <

controls

The number of controls to be matched to each treatment.

tol

Tolerance -- see fullmatch for details.

remove.unmatchables

Should treatment group members for which there are no eligible controls be removed prior to matching?

Value

Primarily, a named vector of class c('optmatch', 'factor'). Elements of this vector correspond to members of the treatment and control groups in reference to which the matching problem was posed, and are named accordingly; the names are taken from the row and column names of distance. Each element of the vector is the concatenation of: (i) a character abbreviation of subclass.indices, if that argument was given, or the string 'm' if it was not; (ii) the string .; and (iii) a nonnegative integer or the string NA. In this last place, positive whole numbers indicate placement of the unit into a matched set, a number beginning with zero indicates a unit that was not matched, and NA indicates that all or part of the matching problem given to fullmatch was found to be infeasible. Secondarily, fullmatch returns various data about the matching process and its result, stored as attributes of the named vector which is its primary output. In particular, the exceedances attribute gives upper bounds, not necessarily sharp, for the amount by which the sum of distances between matched units in the result of fullmatch exceeds the least possible sum of distances between matched units in a feasible solution to the matching problem given to fullmatch. (Such a bound is also printed by print.optmatch and by summary.optmatch.)

Details

This is a wrapper to fullmatch; see its documentation for more information.

fullmatch tries to guess the order in which units would have been given in a data frame, and to order the factor that it returns accordingly. If the dimnames of distance, or the matrices it lists, are not simply row numbers of the data frame you're working with, then you should compare the names of fullmatch's output to your row names in order to be sure things are in the proper order. You can relieve yourself of these worries by using mdist to produce the distances, as it passes the ordering of units to fullmatch, which then uses it to order its outputs.

The value of tol can have a substantial effect on computation time; with smaller values, computation takes longer. Not every tolerance can be met, and how small a tolerance is too small varies with the machine and with the details of the problem. If fullmatch can't guarantee that the tolerance is as small as the given value of argument tol, then matching proceeds but a warning is issued.

If remove.unmatchables is FALSE, then if there are unmatchable treated units then the matching as a whole will fail and no units will be matched. If TRUE, then this unit will be removed and the function will attempt to match each of the other treatment units. (In this case matching can still fail, if there is too much competition for certain controls; if you find yourself in that situation you should consider full matching, which necessarily finds a match for everyone with an eligible match somewhere.)

References

Hansen, B.B. and Klopfer, S.O. (2006), Optimal full matching and related designs via network flows, Journal of Computational and Graphical Statistics, 15, 609--627.

Examples

Run this code

data(nuclearplants) 
### Pair matching on a Mahalanobis distance 
mhd <- mdist(pr ~ t1 + t2, data =nuclearplants) 
( pm1 <- pairmatch(mhd) ) 
summary(pm1) 
### Pair matching within a propensity score caliper.  
ppty <- glm(pr~.-(pr+cost), family=binomial(), data=nuclearplants) 
( pm2 <- pairmatch(mhd+caliper(2,ppty)) ) 
summary(pm2)

### Propensity balance assessment. Requires RItools package.
library(RItools) ; summary(pm2,ppty)

### 1:2 matched triples
tm <- pairmatch(mhd, controls=2)
summary(tm)

### Creating a data frame with the matched sets attached.
### mdist(), caliper() and the like cooperate with pairmatch()
### to make sure observations are in the proper order:
all.equal(names(tm), row.names(nuclearplants))
### So our data frame including the matched sets is just
cbind(nuclearplants, matches=tm)

### In contrast, if your matching distance is an ordinary matrix
### (as earlier versions of optmatch required), you'll
### have to align it by observation name with your data set. 
cbind(nuclearplants, matches = tm[row.names(nuclearplants)])

Run the code above in your browser using DataLab