mahal.dist: Assemble Mahalanobis distances and prepare for matching

Description

Calculates squared Mahalanobis distances between treatment and control observations on given variables, assembling them into a discrepancy matrix (or matrices) from which pairmatch() or fullmatch() can determine optimal matches. (If vectors x and y encode two observations' values of the specified variables, then the squared Mahalanobis distance between them is $$D^2 = (x - y)'\Sigma^{-1}(x-y)$$ .)

Usage

mahal.dist(distance.fmla, data, structure.fmla = NULL, inverse.cov = NULL)

Arguments

distance.fmla

A formula with variables to be combined in the Mahalanobis distance on its right-hand side and the treatment variable on its left.

data

Data frame in which distance.fmla and (if specified) structure.fmla are to be evaluated.

structure.fmla

Optional formula argument specifying subclasses within which matches are to be performed. If omitted, no subclassification is done. If it is given, its left-hand side gives the treatment variable and its RHS gives variables on which to stratif

inverse.cov

The inverse covariance of the variables to be combined into the Mahalanobis distance (optional).

Value

Object of class optmatch.dlist, which is suitable to be given as distance argument to fullmatch or pairmatch.
Specifically, a list of matrices, one for each subclass defined by the interaction of variables appearing on the right hand side of structure.fmla. Each of these is a number of treatments by number of controls matrix of propensity distances. The list also carries some metadata as attributes, data that is not of direct interest to the user but is useful to fullmatch() and pairmatch().

Details

Mahalanobis distance tracks the discrepancy between points on a number of given variables, after standardizing the variables and taking account of their covariances. It is best suited to variables whose joint distribution resembles a multivariate Normal.

The purpose of giving a structure.fmla argument is to speed up large problems. Variables appearing on its right-hand side will be interacted to create the subclasses. If structure.fmla is given then its LHS is used to define treatment and control groups (and one doesn't have to put anything on the LHS of distance.fmla).

The function attempts to calculate the inverse covariance itself, so ordinarily you shouldn't need to give it one. If you'll be calling the function repeatedly, however, it may speed things up to compute and store the inverse covariance once, rather than each time this function is called; in that case you can save time by giving the inverse.covariance argument.

Examples

Run this code

data(nuclearplants)
mhd1 <- mahal.dist(pr~date+cum.n, nuclearplants)
lapply(mhd1, round)
attributes(mhd1)
fullmatch(mhd1)
##- Mahalanobis within subclasses defined by levels of pt
mhd2 <- mahal.dist(~date+cum.n, nuclearplants, pr~pt)
lapply(mhd2, round)
fullmatch(mhd2)
##- Trick mahal.dist into returning absolute differences on a scalar.
mhd3 <- mahal.dist(pr~date, nuclearplants,
inverse.cov=matrix(1,1,1,dimnames=list("date", "date")))
mhd3[[1]]
##- Matching within calipers of 3 years
fullmatch(mhd1/(mhd3<3))

Run the code above in your browser using DataLab