GenMatch
can be supplied to the Match
function (via the Weight.matrix
option) to obtain causal
estimates. GenMatch
uses genoud
to
perform the genetic search. Using the cluster
option, one may
use multiple computers, CPUs or cores to perform parallel
computations.GenMatch(Tr, X, BalanceMatrix=X, estimand="ATT", M=1, weights=NULL,
pop.size = 100, max.generations=100,
wait.generations=4, hard.generation.limit=FALSE,
starting.values=rep(1,ncol(X)),
fit.func="pvals",
MemoryMatrix=TRUE,
exact=NULL, caliper=NULL, replace=TRUE, ties=TRUE,
CommonSupport=FALSE, nboots=0, ks=TRUE, verbose=FALSE,
distance.tolerance=1e-05,
tolerance=sqrt(.Machine$double.eps),
min.weight=0, max.weight=1000,
Domains=NULL, print.level=2,
project.path=NULL,
paired=TRUE, loss=1,
data.type.integer=FALSE,
restrict=NULL,
cluster=FALSE, balance=TRUE, ...)
X
, but it can
in principle be a matrix which contains more or less variables than
X
or variables which are transformed in vties
option.Y
which
provides observation specific weights.genoud
uses to solve the optimization problem.
The theorems proving that genetic algorithms find good solutions are
asymptotic in populagenoud
will run when
optimizing. This is a soft limit. The maximum generation
limit will be binding only if max.generations
and
hard.generation.limit
.max.generations
variable is a binding constraint. If
hard.generation.limit
is FALSE, then
the algorithm may exceed the max.generations
count if the obX
. This
vector contains the starting weights each of the variables is
given. The starting.values
vector is a way for the user
to insert one individGenMatch
should optimize.
The user may choose from the following or provide a function:
pvals
: maximize the p.values from (paired) t-tests and
Kolmogorov-Smirnov tests conducted for each column in
<X
. If a logical vector is provided, a logical value should
be providedFALSE
, the order of matches
generally matters. Matches will be found in the same order as the
data are sorted. Thus, the match(es) for the first ties==TRUE
. If, for example, one treated observation
matches more than one control observation, the matched dataset will
include the multiple matchedcaliper
option is to
be ks
test. By default this option is set to zero so no
bootstraps are done. See ks.boot
for additional
details.cluster
option is used.distance.tolerance
are deemed to be equal to zero.
This option can be used to perform a type of optimal mncol(X)
$\times 2$ matrix.
The first column is the lower bound, and the second column is the
upper bound for each variable over which genoud
will
search for weights.GenMatch
will
print details about the population at each generatit.test
should be
used when determining balance.1
,
implies "lexical" optimization: all of the balance statistics will
be sorted from the most discrepant to the least and weights will be
picked which minimize the maximum disTRUE
, search will be done over integer weights. Note
that before version 4.1, the default was to use integer weights.makeCluster
commands in
the snow package or a vector of machine names so that GenMatch
can
setup tgenoud
.BalanceMatrix
, unless there are dichotomous variables in this
matrix. There is one p-value for each covariate in
BalanceMatrix
which is the result of a paired t-test and
another p-value for each non-dichotomous variable in
BalanceMatrix
which is the result of a Kolmogorov-Smirnov
test. Recall that these p-values cannot be interpreted as hypothesis
tests. They are simply measures of balance.X
.X
. This object corresponds to the
Weight.matrix
in the Match
function.index.treated
,
index.control
and weights
objects which are returned by
Match
.X
variables. This object has the same length as the number of
covariates in X
. Diamond, Alexis and Jasjeet S. Sekhon. 2005. ``Genetic Matching for
Estimating Causal Effects: A General Multivariate Matching Method for
Achieving Balance in Observational Studies.'' Working Paper.
Sekhon, Jasjeet Singh and Walter R. Mebane, Jr. 1998. ``Genetic
Optimization Using Derivatives: Theory and Application to Nonlinear
Models.'' Political Analysis, 7: 187-210.
Match
, summary.Match
,
MatchBalance
, genoud
,
balanceMV
, balanceUV
, qqstats
,
ks.boot
, GerberGreenImai
, lalonde
data(lalonde)
attach(lalonde)
#The covariates we want to match on
X = cbind(age, educ, black, hisp, married, nodegr, u74, u75, re75, re74)
#The covariates we want to obtain balance on
BalanceMat <- cbind(age, educ, black, hisp, married, nodegr, u74, u75, re75, re74,
I(re74*re75))
#
#Let's call GenMatch() to find the optimal weight to give each
#covariate in 'X' so as we have achieved balance on the covariates in
#'BalanceMat'. This is only an example so we want GenMatch to be quick
#so the population size has been set to be only 16 via the 'pop.size'
#option. This is *WAY* too small for actual problems.
#For details see http://sekhon.berkeley.edu/papers/MatchingJSS.pdf.
#
genout <- GenMatch(Tr=treat, X=X, BalanceMatrix=BalanceMat, estimand="ATE", M=1,
pop.size=16, max.generations=10, wait.generations=1)
#The outcome variable
Y=re78/1000
#
# Now that GenMatch() has found the optimal weights, let's estimate
# our causal effect of interest using those weights
#
mout <- Match(Y=Y, Tr=treat, X=X, estimand="ATE", Weight.matrix=genout)
summary(mout)
#
#Let's determine if balance has actually been obtained on the variables of interest
#
mb <- MatchBalance(treat~age +educ+black+ hisp+ married+ nodegr+ u74+ u75+
re75+ re74+ I(re74*re75),
match.out=mout, nboots=500, ks=TRUE, mv=FALSE)
# For more examples see: http://sekhon.berkeley.edu/matching/R.
Run the code above in your browser using DataLab