Last chance! 50% off unlimited learning
Sale ends in
Estimate Slapin and Proksch's (2008) "wordfish" Poisson scaling model of one-dimensional document positions using conditional maximum likelihood.
textmodel_wordfish(
x,
dir = c(1, 2),
priors = c(Inf, Inf, 3, 1),
tol = c(1e-06, 1e-08),
dispersion = c("poisson", "quasipoisson"),
dispersion_level = c("feature", "overall"),
dispersion_floor = 0,
sparse = FALSE,
abs_err = FALSE,
svd_sparse = TRUE,
residual_floor = 0.5
)
the dfm on which the model will be fit
set global identification by specifying the indexes for a pair of
documents such that
prior precisions for the estimated parameters
tolerances for convergence. The first value is a convergence threshold for the log-posterior of the model, the second value is the tolerance in the difference in parameter values from the iterative conditional maximum likelihood (from conditionally estimating document-level, then feature-level parameters).
sets whether a quasi-Poisson quasi-likelihood should be
used based on a single dispersion parameter ("poisson"
), or
quasi-Poisson ("quasipoisson"
)
sets the unit level for the dispersion parameter,
options are "feature"
for term-level variances, or "overall"
for a single dispersion parameter
constraint for the minimal underdispersion multiplier
in the quasi-Poisson model. Used to minimize the distorting effect of
terms with rare term or document frequencies that appear to be severely
underdispersed. Default is 0, but this only applies if dispersion =
"quasipoisson"
.
specifies whether the "dfm"
is coerced to dense. While
setting this to TRUE
will make it possible to handle larger dfm
objects (and make execution faster), it will generate slightly different
results each time, because the sparse SVD routine has a stochastic element.
specifies how the convergence is considered
uses svd to initialize the starting values of theta,
only applies when sparse = TRUE
specifies the threshold for residual matrix when
calculating the svds, only applies when sparse = TRUE
An object of class textmodel_fitted_wordfish
. This is a list
containing:
global identification of the dimension
estimated document positions
estimated document fixed effects
estimated feature marginal effects
estimated word fixed effects
document labels
feature labels
regularization parameter for betas in Poisson form
log likelihood at convergence
standard errors for theta-hats
dfm to which the model was fit
The returns match those of Will Lowe's R implementation of
wordfish
(see the austin package), except that here we have renamed
words
to be features
. (This return list may change.) We
have also followed the practice begun with Slapin and Proksch's early
implementation of the model that used a regularization parameter of
sepriors
.
Slapin, J. & Proksch, S.O. (2008). A Scaling Model for Estimating Time-Series Party Positions from Texts. American Journal of Political Science, 52(3), 705--772.
Lowe, W. & Benoit, K.R. (2013). Validating Estimates of Latent Traits from Textual Data Using Human Judgment as a Benchmark. Political Analysis, 21(3), 298--313.
# NOT RUN {
(tmod1 <- textmodel_wordfish(data_dfm_lbgexample, dir = c(1,5)))
summary(tmod1, n = 10)
coef(tmod1)
predict(tmod1)
predict(tmod1, se.fit = TRUE)
predict(tmod1, interval = "confidence")
# }
# NOT RUN {
dfmat <- dfm(data_corpus_irishbudget2010)
(tmod2 <- textmodel_wordfish(dfmat, dir = c(6,5)))
(tmod3 <- textmodel_wordfish(dfmat, dir = c(6,5),
dispersion = "quasipoisson", dispersion_floor = 0))
(tmod4 <- textmodel_wordfish(dfmat, dir = c(6,5),
dispersion = "quasipoisson", dispersion_floor = .5))
plot(tmod3$phi, tmod4$phi, xlab = "Min underdispersion = 0", ylab = "Min underdispersion = .5",
xlim = c(0, 1.0), ylim = c(0, 1.0))
plot(tmod3$phi, tmod4$phi, xlab = "Min underdispersion = 0", ylab = "Min underdispersion = .5",
xlim = c(0, 1.0), ylim = c(0, 1.0), type = "n")
underdispersedTerms <- sample(which(tmod3$phi < 1.0), 5)
which(featnames(dfmat) %in% names(topfeatures(dfmat, 20)))
text(tmod3$phi, tmod4$phi, tmod3$features,
cex = .8, xlim = c(0, 1.0), ylim = c(0, 1.0), col = "grey90")
text(tmod3$phi['underdispersedTerms'], tmod4$phi['underdispersedTerms'],
tmod3$features['underdispersedTerms'],
cex = .8, xlim = c(0, 1.0), ylim = c(0, 1.0), col = "black")
if (requireNamespace("austin")) {
tmod5 <- austin::wordfish(quanteda::as.wfm(dfmat), dir = c(6,5))
cor(tmod1$theta, tmod5$theta)
}
# }
Run the code above in your browser using DataLab