summary.matchit: View a balance summary of a `matchit` object

Description

Computes and prints balance statistics for matchit and matchit.subclass objects. Balance should be assessed to ensure the matching or subclassification was effective at eliminating treatment group imbalance and should be reported in the write-up of the results of the analysis.

Usage

# S3 method for matchit
summary(object, interactions = FALSE,
        addlvariables = NULL, standardize = TRUE,
        data = NULL, pair.dist = TRUE, un = TRUE,
        improvement = TRUE, ...)
# S3 method for matchit.subclass
summary(object, interactions = FALSE,
        addlvariables = NULL, standardize = TRUE,
        data = NULL, pair.dist = FALSE, subclass = FALSE,
        un = TRUE, improvement = TRUE, ...)
# S3 method for summary.matchit
print(x, digits = max(3, getOption("digits") - 3),
      ...)
# S3 method for summary.matchit.subclass
print(x, digits = max(3, getOption("digits") - 3),
      ...)

Arguments

object

a matchit object; the output of a call to matchit.

interactions

logical; whether to compute balance statistics for two-way interactions and squares of covariates. Default is FALSE.

addlvariables

additional variable for which balance statistics are to be computed along with the covariates in the matchit object. Can be entered in one of three ways: as a data frame of covariates with as many rows as there were units in the original matchit call, as a string containing the names of variables in data, or as a right-sided formula with the additional variables (and possibly their transformations) found in data, the environment, or the matchit object. Balance on squares and interactions of the additional variables will be included if interactions = TRUE.

standardize

logical; whether to compute standardized (TRUE) or unstandardized (FALSE) statistics. The standardized statistics are the standardized mean difference and the median, mean, and maximum of the difference in the (weighted) empirical cumulative distribution functions (ECDFs). The unstandardized statistics are the raw mean difference and the median, mean, and maximum of the quantile-quantile (QQ) difference. See Details below. Default is TRUE.

data

a optional data frame containing variables named in addlvariables if specified as a string or formula.

pair.dist

logical; whether to compute average absolute pair distances. For matching methods that don't include a match.matrix component in the output (i.e., exact matching, coarsened exact matching, full matching, and subclassification), computing pair differences can take a long time, especially for large datasets and with many covariates. For other methods (i.e., nearest neighbor, optimal, and genetic matching), computation is fairly quick. Default is FALSE for subclassification and TRUE otherwise.

logical; whether to compute balance statistics for the unmatched sample. Default TRUE; set to FALSE for more concise output.

improvement

logical; whether to compute the percent reduction in imbalance. Default TRUE; set to FALSE for more concise output.

subclass

after subclassification, whether to display balance for individual subclasses, and, if so, for which ones. Can be TRUE (display balance for all subclasses), FALSE (display balance only in aggregate), or the indices (e.g., 1:6) of the specific subclasses for which to display balance. When anything other than FALSE, aggregate balance statistics will not be displayed. Default is FALSE.

digits

the number of digits to round balance statistics to.

a summay.matchit or summary.matchit.subclass object; the output of a call to summary.

…

ignored.

Value

For matchit objects, a summary.matchit object, which is a list with the following components:

call

the original call to matchit

a matrix of the sample sizes in the original (unmatched) and matched samples

sum.all

if un = TRUE, a matrix of balance statistics for each covariate in the original (unmatched) sample

sum.matched

a matrix of balance statistics for each covariate in the matched sample

reduction

if improvement = TRUE, a matrix of the percent reduction in imbalance for each covariate in the matched sample

For match.subclass objects, a summary.matchit.subclass object, which is a list as above with the following additional components:

call

the original call to matchit

sum.all

if un = TRUE, a matrix of balance statistics for each covariate in the original sample

sum.subclass

if subclass is not FALSE, a list of matrices of balance statistics for each subclass

sum.across

a matrix of balance statistics for each covariate computed using the subclassification weights

reduction

if improvement = TRUE, a matrix of the percent reduction in imbalance for each covariate in the matched sample

a matrix of sample sizes within each subclass

a matrix of the sample sizes in the original (unmatched) and matched samples

Details

summary computes a balance summary of a matchit object. This include balance before and after matching or subclassification, as well as the percent improvement in balance. The variables for which balance statistics are computed are those included in the formula, exact, and mahvars arguments to matchit, as well as the distance measure if distance is not "mahalanobis". The X component of the matchit object is used to supply the covariates.

The standardized mean differences are computed both before and after matching or subclassification as the difference in treatment group means divided by a standardization factor computed in the unmatched (original) sample. The standardization factor depends on the argument supplied to estimand in matchit: for "ATT", it is the standard deviation in the treated group; for "ATC", it is the standard deviation in the control group; for "ATE", it is the square root of the average of the variances within each treatment group. The post-matching mean difference is computed with weighted means in the treatment groups using the matching or subclassification weights.

The variance ratio is computed as the ratio of the treatment group variances. Variance ratios are not computed for binary variables because their variance is a function solely of their mean. After matching, weighted variances are computed using the formula used in cov.wt. The percent reduction in bias is computed using the log of the variance ratios.

The eCDF difference statistics are computed by creating a (weighted) eCDF for each group and taking the difference between them for each covariate value. The eCDF is a function that outputs the (weighted) proportion of units with covariate values at or lower than the input value. The maximum eCDF difference is the same thing as the Kolmogorov-Smirnov statistic. The values are bounded at zero and one, with values closer to zero indicating good overlap between the covariate distributions in the treated and control groups. For binary variables, all eCDF differences are equal to the (weighted) difference in proportion and are computed that way.

The QQ difference statistics are computed by creating two samples of the same size by interpolating the values of the larger one. The values are arranged in order for each sample. The QQ difference for each quantile is the difference between the observed covariate values at that quantile between the two groups. The difference is on the scale of the original covariate. Values close to zero indicate good overlap between the covariate distributions in the treated and control groups. A weighted interpolation is used for post-matching QQ differences. For binary variables, all QQ differences are equal to the (weighted) difference in proportion and are computed that way.

The pair distance is the average of the absolute differences of a variable between pairs. For example, if a treated unit was paired with four control units, that set of units would contribute four absolute differences to the average. Within a subclass, each combination of treated and control unit forms a pair that contributes once to the average. The pair distance is described in Stuart and Green (2008) and is the value that is minimized when using optimal (full) matching. When standardize = TRUE, the standardized versions of the variables are used, where the standardization factor is as described above for the standardized mean differences. Pair distances are not computed in the unmatched sample (because there are no pairs). Because pair distance can take a while to compute, especially with large datasets or for many covariates, setting pair.dist = FALSE is one way to speed up summary.

The effective sample size (ESS) is a measure of the size of a hypothetical unweighted sample with roughly the same precision as a weighted sample. When non-uniform matching weights are computed (e.g., as a result of full matching, matching with replacement, or subclassification), the ESS can be used to quantify the potential precision remaining in the matched sample. The ESS will always be less than or equal to the matched sample size, reflecting the loss in precision due to using the weights. With non-uniform weights, it is printed in the sample size table; otherwise, it is removed because it does not contain additional information above the matched sample size.

After subclassification, the aggregate balance statistics are computed using the subclassification weights rather than averaging across subclasses.

All balance statistics (except pair differences) are computed incorporating the sampling weights upplied to matchit, if any. The unadjusted balance statistics include the sampling weights and the adjusted balance statistics use the matching weights multiplied by the sampling weights.

When printing, NA values are replaced with periods (.), and the pair distance column in the unmatched and percent balance improvement components of the output are omitted.

Examples

Run this code

# NOT RUN {
data("lalonde")
m.out <- matchit(treat ~ age + educ + married +
                   race + re74, data = lalonde,
                 method = "nearest", exact = ~ married,
                 replace = TRUE)
summary(m.out, interactions = TRUE)

s.out <- matchit(treat ~ age + educ + married +
                   race + nodegree + re74 + re75,
                 data = lalonde, method = "subclass")
summary(s.out, addlvariables = ~log(age) + I(re74==0))
summary(s.out, subclass = TRUE)
# }