Learn R Programming

DOS (version 1.0.0)

dynarski: A Natural Experiment Concerning Subsidized College Education

Description

The data are from Susan Dynarski (2003)'s study of the effects of a subsidy for college education provided Social Security to children whose fathers had died. The subsidy was eliminated in 1982. As presented here, the data are only a portion of the data used in Dynarski (2003), specifically the period from 1979-1981 when the subsidy was available. The data set here is Table 13.1 of Design of Observational Studies (1st edition), where it was used with the limited goal of illustrating matching techniques.

Usage

data("dynarski")

Arguments

Format

A data frame with 2820 observations on the following 10 variables.

id

identification number

zb

treatment indicator, zb=1 if 1979-1981 with father deceased, or zb=0 if 1979-1981 with father alive

faminc

family income, in units of tens of thousands of dollars

incmiss

indicator for missing family income

black

1 if black, 0 otherwise

hisp

1 if hispanic, 0 otherwise

afqtpct

test score percentile, Armed Forces Qualifications Test

edmissm

indicator for missing education of mother

edm

education of mother, 1 is <high school, 2 is high school, 3 is some college, 4 is a BA degree or higher

female

1 if female, 0 if male

Details

The examples reproduce steps in Chapter 13 of Rosenbaum (2010). Portions of the examples require you to load Ben Hansen's optmatch package and accept its academic license; these portions of the examples do not run automatically. Hansen's optmatch package uses the Fortran code of Bertsekas and Tseng (1988).

References

Bertsekas, D. P. and Tseng, P. (1988). The relax codes for linear minimum cost network flow problems. Annals of Operations Research, 13(1), 125-190.

Dynarski, S. M. (2003). Does aid matter? Measuring the effect of student aid on college attendance and completion. American Economic Review, 93(1), 279-288.

Hansen, B. B. (2007). Flexible, optimal matching for observational studies. R News, 7, 18-24.

Hansen, B. B. and Klopfer, S. O. (2006). Optimal full matching and related designs via network flows. Journal of Computational and Graphical Statistics, 15(3), 609-627.

Rosenbaum, P. R. (2010). Design of Observational Studies. New York: Springer.

Examples

Run this code
# NOT RUN {
#
data(dynarski)
# Table 13.1 of Rosenbaum (2010)
head(dynarski)
# Table 13.2 of Rosenbaum (2010)
zb<-dynarski$zb
zbf<-factor(zb,levels=c(1,0),labels=c("Father deceased","Father not deceased"))
table(zbf)
Xb<-dynarski[,3:10]

# Estimate the propensity score, Rosenbaum (2010, Section 13.3)
p<-glm(zb~Xb$faminc+Xb$incmiss+Xb$black+Xb$hisp
    +Xb$afqtpct+Xb$edmissm+Xb$edm+Xb$female,
    family=binomial)$fitted.values
# Figure 13.1 in Rosenbaum (2010)
boxplot(p~zbf,ylab="Propensity score",main="1979-1981 Cohort")

# Read about missing covariate values in section 13.4 of Rosenbaum (2010)

# Robust Mahalanobis distance matrix, treated x control
dmat<-smahal(zb,Xb)
dim(dmat)
# Table 13.3 in Rosenbaum (2010)
round(dmat[1:5,1:5],2)

# Add a caliper on the propensity score using a penalty function
dmat<-addcaliper(dmat,zb,p,caliper=.2)
dim(dmat)
# Table 13.4 in Rosenbaum (2010)
round(dmat[1:5,1:5],2)
# }
# NOT RUN {
# YOU MUST LOAD the optmatch package and accept its license to continue.
# Note that the optmatch package has changed since 2010.  It now suggests
# that you indicate the data explicitly as data=dynarski.

# Creating a 1-to-10 match, as in section 13.6 of Rosenbaum (2010)
# This may take a few minutes.
m<-fullmatch(dmat,data=dynarski,min.controls = 10,max.controls = 10,omit.fraction = 1379/2689)
length(m)
sum(matched(m))
1441/11 # There are 131 matched sets, 1 treated, 10 controls

# Alternative, simpler code to do the same thing
m2<-pairmatch(dmat,controls=10,data=dynarski)
# Results are the same:
sum(m[matched(m)]!=m2[matched(m2)])

# Housekeeping
im<-as.integer(m)
dynarski<-cbind(dynarski,im)
dm<-dynarski[matched(m),]
dm<-dm[order(dm$im,1-dm$zb),]

# Table 13.5 in Rosenbaum (2010)
which(dm$id==10)
dm[188:198,]
which(dm$id==396)
dm[23:33,]
which(dm$id==3051)
dm[1068:1078,]
# In principle, there can be a tie, in which several different
# matched samples all minimize the total distance.  On my
# computer, this calculation reproduces Table 13.5, but were
# there a tie, optmatch should return one of the tied optimal
# matches, but not any particular one.
# }

Run the code above in your browser using DataLab