ecoML
is used to fit parametric models for ecological inference in
\(2 \times 2\) tables via Expectation Maximization (EM) algorithms. The
data is specified in proportions. At it's most basic setting, the algorithm
assumes that the individual-level proportions (i.e., \(W_1\) and
\(W_2\)) and distributed bivariate normally (after logit transformations).
The function calculates point estimates of the parameters for models based
on different assumptions. The standard errors of the point estimates are
also computed via Supplemented EM algorithms. Moreover, ecoML
quantifies the amount of missing information associated with each parameter
and allows researcher to examine the impact of missing information on
parameter estimation in ecological inference. The models and algorithms are
described in Imai, Lu and Strauss (2008, 2011).
ecoML(
formula,
data = parent.frame(),
N = NULL,
supplement = NULL,
theta.start = c(0, 0, 1, 1, 0),
fix.rho = FALSE,
context = FALSE,
sem = TRUE,
epsilon = 10^(-6),
maxit = 1000,
loglik = TRUE,
hyptest = FALSE,
verbose = FALSE
)
An object of class ecoML
containing the following elements:
The matched call.
The row margin, \(X\).
The column margin, \(Y\).
The size of each table, \(N\).
The assumption under which model is estimated. If
context = FALSE
, CAR assumption is adopted and no contextual effect
is modeled. If context = TRUE
, NCAR assumption is adopted, and
contextual effect is modeled.
Whether SEM algorithm is used to estimate the standard errors and observed information matrix for the parameter estimates.
Whether the correlation or the partial correlation between \(W_1\) an \(W_2\) is fixed in the estimation.
If fix.rho = TRUE
, the value that \(corr(W_1, W_2)\) is
fixed to.
The precision criterion for EM convergence. \(\sqrt{\epsilon}\) is the precision criterion for SEM convergence.
The ML estimates of \(E(W_1)\),\(E(W_2)\),
\(var(W_1)\),\(var(W_2)\), and \(cov(W_1,W_2)\). If context =
TRUE
, \(E(X)\),\(cov(W_1,X)\), \(cov(W_2,X)\) are also reported.
In-sample estimation of \(W_1\) and \(W_2\).
The sufficient statistics for theta.em
.
Number of EM iterations before convergence is achieved.
Number of SEM iterations before convergence is achieved.
The log-likelihood of the model when convergence is achieved.
A vector saving the value of the log-likelihood function at each iteration of the EM algorithm.
A matrix saving the unweighted mean estimation of the logit-transformed individual-level proportions (i.e., \(W_1\) and \(W_2\)) at each iteration of the EM process.
A matrix saving the
log of the variance estimation of the logit-transformed individual-level
proportions (i.e., \(W_1\) and \(W_2\)) at each iteration of EM process.
Note, non-transformed variances are displayed on the screen (when
verbose = TRUE
).
A matrix saving the fisher
transformation of the estimation of the correlations between the
logit-transformed individual-level proportions (i.e., \(W_1\) and
\(W_2\)) at each iteration of EM process. Note, non-transformed
correlations are displayed on the screen (when verbose = TRUE
).
Moreover, when sem=TRUE
, ecoML
also output the following
values:
The matrix characterizing the rates of convergence of the EM algorithms. Such information is also used to calculate the observed-data information matrix
The (expected) complete data information
matrix estimated via SEM algorithm. When context=FALSE, fix.rho=TRUE
,
Icom
is 4 by 4. When context=FALSE, fix.rho=FALSE
, Icom
is 5 by 5. When context=TRUE
, Icom
is 9 by 9.
The observed information matrix. The dimension of Iobs
is same as Icom
.
The difference between Icom
and Iobs
.
The dimension of Imiss
is same as miss
.
The (symmetrized) variance-covariance matrix of the ML parameter
estimates. The dimension of Vobs
is same as Icom
.
The (expected) complete-data variance-covariance matrix. The
dimension of Iobs
is same as Icom
.
The estimated variance-covariance matrix of the ML parameter
estimates. The dimension of Vobs
is same as Icom
.
The fraction of missing information associated with each parameter estimation.
The proportion of increased variance associated with each parameter estimation due to observed data.
The largest eigen value of Imiss
.
The complete data information matrix for the fisher transformed parameters.
The observed data information matrix for the fisher transformed parameters.
The fractions of missing information associated with the fisher transformed parameters.
A symbolic description of the model to be fit, specifying the
column and row margins of \(2 \times 2\) ecological tables. Y ~ X
specifies Y
as the column margin (e.g., turnout) and X
(e.g.,
percent African-American) as the row margin. Details and specific examples
are given below.
An optional data frame in which to interpret the variables in
formula
. The default is the environment in which ecoML
is
called.
An optional variable representing the size of the unit; e.g., the
total number of voters. N
needs to be a vector of same length as
Y
and X
or a scalar.
An optional matrix of supplemental data. The matrix has
two columns, which contain additional individual-level data such as survey
data for \(W_1\) and \(W_2\), respectively. If NULL
, no
additional individual-level data are included in the model. The default is
NULL
.
A numeric vector that specifies the starting values for
the mean, variance, and covariance. When context = FALSE
, the
elements of theta.start
correspond to (\(E(W_1)\), \(E(W_2)\),
\(var(W_1)\), \(var(W_2)\), \(cor(W_1,W_2)\)). When context =
TRUE
, the elements of theta.start
correspond to (\(E(W_1)\),
\(E(W_2)\), \(var(W_1)\), \(var(W_2)\), \(corr(W_1, X)\),
\(corr(W_2, X)\), \(corr(W_1,W_2)\)). Moreover, when
fix.rho=TRUE
, \(corr(W_1,W_2)\) is set to be the correlation
between \(W_1\) and \(W_2\) when context = FALSE
, and the partial
correlation between \(W_1\) and \(W_2\) given \(X\) when context
= FALSE
. The default is c(0,0,1,1,0)
.
Logical. If TRUE
, the correlation (when
context=TRUE
) or the partial correlation (when context=FALSE
)
between \(W_1\) and \(W_2\) is fixed through the estimation. For
details, see Imai, Lu and Strauss(2006). The default is FALSE
.
Logical. If TRUE
, the contextual effect is also
modeled. In this case, the row margin (i.e., X) and the individual-level
rates (i.e., \(W_1\) and \(W_2\)) are assumed to be distributed
tri-variate normally (after logit transformations). See Imai, Lu and Strauss
(2006) for details. The default is FALSE
.
Logical. If TRUE
, the standard errors of parameter
estimates are estimated via SEM algorithm, as well as the fraction of
missing data. The default is TRUE
.
A positive number that specifies the convergence criterion
for EM algorithm. The square root of epsilon
is the convergence
criterion for SEM algorithm. The default is 10^(-6)
.
A positive integer specifies the maximum number of iterations
before the convergence criterion is met. The default is 1000
.
Logical. If TRUE
, the value of the log-likelihood
function at each iteration of EM is saved. The default is TRUE
.
Logical. If TRUE
, model is estimated under the null
hypothesis that means of \(W1\) and \(W2\) are the same. The default is
FALSE
.
Logical. If TRUE
, the progress of the EM and SEM
algorithms is printed to the screen. The default is FALSE
.
When SEM
is TRUE
, ecoML
computes the observed-data
information matrix for the parameters of interest based on Supplemented-EM
algorithm. The inverse of the observed-data information matrix can be used
to estimate the variance-covariance matrix for the parameters estimated from
EM algorithms. In addition, it also computes the expected complete-data
information matrix. Based on these two measures, one can further calculate
the fraction of missing information associated with each parameter. See
Imai, Lu and Strauss (2006) for more details about fraction of missing
information.
Moreover, when hytest=TRUE
, ecoML
allows to estimate the
parametric model under the null hypothesis that mu_1=mu_2
. One can
then construct the likelihood ratio test to assess the hypothesis of equal
means. The associated fraction of missing information for the test statistic
can be also calculated. For details, see Imai, Lu and Strauss (2006) for
details.
Imai, Kosuke, Ying Lu and Aaron Strauss. (2011). “eco: R Package for Ecological Inference in 2x2 Tables” Journal of Statistical Software, Vol. 42, No. 5, pp. 1-23.
Imai, Kosuke, Ying Lu and Aaron Strauss. (2008). “Bayesian and Likelihood Inference for 2 x 2 Ecological Tables: An Incomplete Data Approach” Political Analysis, Vol. 16, No. 1 (Winter), pp. 41-69.
eco
, ecoNP
, summary.ecoML
## load the census data
data(census)
## NOTE: convergence has not been properly assessed for the following
## examples. See Imai, Lu and Strauss (2006) for more complete analyses.
## In the first example below, in the interest of time, only part of the
## data set is analyzed and the convergence requirement is less stringent
## than the default setting.
## In the second example, the program is arbitrarily halted 100 iterations
## into the simulation, before convergence.
## load the Robinson's census data
data(census)
## fit the parametric model with the default model specifications
if (FALSE) res <- ecoML(Y ~ X, data = census[1:100,], N=census[1:100,3],
epsilon=10^(-6), verbose = TRUE)
## summarize the results
if (FALSE) summary(res)
## fit the parametric model with some individual
## level data using the default prior specification
surv <- 1:600
if (FALSE) res1 <- ecoML(Y ~ X, context = TRUE, data = census[-surv,],
supplement = census[surv,c(4:5,1)], maxit=100, verbose = TRUE)
## summarize the results
if (FALSE) summary(res1)
Run the code above in your browser using DataLab