makedist: Assemble match distances from a data frame

Description

Helper function to produce first arguments to fullmatch(), reducing memory requirements for fullmatch() and heading off certain user errors.

Usage

makedist(structure.fmla, data, fn = function(trtvar, dat, ...) {
    matrix(0, sum(trtvar), sum(!trtvar), dimnames = list(names(trtvar)[trtvar], names(trtvar)[!trtvar]))
}, ...)

Arguments

structure.fmla

A formula defined w.r.t. data frame data, with a treatment variable on the LHS and either 1, if no stratification prior to matching, or variables defining pre-matching stratification on the RHS.

data

A data frame in which structure.fmla is evaluated and fn operates.

A user-supplied function to compute distances. See details and examples.

...

Additional arguments to fn

Value

A list of matrices, one for each subclass defined by the interaction of variables appearing on the right hand side of structure.fmla. Each of these is a number of treatments by number of controls matrix of distances, with the distance between treatments and controls calculated by the user-given function fn.
The list also has some attributes that are not of direct interest to the user, but are used by fullmatch().

Details

fn should be a function with first two arguments trtvar, a treatment variable, and dat, a data frame. There may be additional arguments. If the function uses variables in dat, these should be referenced using names from the input trtvar, particularly if the sample is being split into strata (ie structure.fmla has a non-trivial RHS). When this happens, fn will be passed a trtvar input observations for only a subset of the rows of dat, so it has to use trtvar to decide which rows of dat to operate on; it does this by lining up names of the (shorter) vector trtvar with row names of dat.

Examples

Run this code

data(nuclearplants)

##-- A distance function used in P. Rosenbaum's (2002) book
rankdiffs <- function(trtvar, dat, vars)
{
dmt <- matrix(0,sum(trtvar), sum(!trtvar)) 
for (vv in vars) {
vvr <- rank(dat[names(trtvar),vv])
dmt <- dmt + abs(outer(vvr[trtvar], vvr[!trtvar],"-"))
}
round(dmt)              
}
##-- Gives a warning because this fn doesn't assign dimnames
(rdd1 <- makedist(pr~1, nuclearplants[nuclearplants$pt==0,], rankdiffs, c("cap","date")))
fullmatch(rdd1)
##-- fullmatch() knows its value should be ordered as the nuclearplants data set is
rdd1$m
##-- now fullmatch() doesn't know the proper order of units and has to guess
fullmatch(rdd1$m)
(rdd2 <- makedist(pr~pt, nuclearplants, rankdiffs, c("cap","date")))
fullmatch(rdd2)

##- Distance on a propensity score
scalardiffs <- function(trtvar,data,scalarname) {
sclr <- data[names(trtvar), scalarname]
names(sclr) <- names(trtvar)
abs(outer(sclr[trtvar],sclr[!trtvar], '-'))
}
nuclearplants$pscore <- glm(pr~.-(pr+cost), family=binomial(),
                      data=nuclearplants)$linear.predictors
##-- Distance for propensity score matching w/o prior stratification
psd1 <- makedist(pr~1, nuclearplants, scalardiffs, "pscore")
fullmatch(psd1)
##-- Distance for propensity score matching within levels of "pt"
psd2 <- makedist(pr~pt, nuclearplants, scalardiffs, "pscore")
fullmatch(psd2)

Run the code above in your browser using DataLab