Learn R Programming

textir (version 2.0-5)

srproj: Multinomial Inverse Regression (MNIR)

Description

Estimation of MNIR sufficient reduction projections. Note that mnlm is just a call to dmr from the distrom package.

Usage

srproj(obj, counts, dir=1:K, ...)
mnlm(cl, covars, counts, mu=NULL, bins=NULL, verb=0, ...)

Arguments

cl

A parallel library socket cluster. See the same argument in help(dmr) for details.

covars

A dense matrix or sparse Matrix of covariates. This should not include the intercept. See the same argument in help(dmr) for details.

counts

A dense matrix or sparse Matrix of response counts (e.g., token counts in text mining). See the same argument in help(dmr) for details. For srproj, this must have the same number of columns as the response dimensions (vocabulary size) in obj.

mu

Pre-specified fixed effects for each observation in the Poisson regression linear equation. See the same argument in help(dmr) for details.

bins

Number of bins into which we will attempt to collapse each column of covars. bins=NULL does no collapsing. See the same argument in help(dmr) for details.

verb

Whether to print some info. See the same argument in help(dmr) for details.

obj

Either a dmr object, as returned from mnlm, or the dmrcoef object obtained by calling coef on the output of mnlm or dmr. The latter will be faster, since coef.dmr is called inside srproj otherwise.

dir

The attribute (covar) dimensions onto which you want to project. The default is all dimensions: 1:K, where K is the number of columns in the covars argument to mnlm.

...

Additional arguments to gamlr from dmr (or mnlm), and to coef.dmr from srproj. See help(gamlr) and help(dmr) for details.

Value

srproj returns a matrix with columns corresponding to directions dir, plus an additional column m holding the row totals of counts. mnlm returns a dmr s3 object. See help(dmr) for details.

Details

These functions provide the first two steps of multinomial inverse regression (see MNIR paper).

mnlm fits multinomial logistic regression parameters under gamma lasso penalization on a factorized Poisson likelihood. The mnlm function, which remains in this package for backwards compatability only, is just call to the dmr function of the distrom library (see DMR paper). For simplicity, we recommend using dmr instead of mnlm. For model selection, coefficients, prediction, and plotting see the relevant functions in help(dmr).

srproj calculates the MNIR Sufficient Reduction projection from text counts on to the attribute dimensions of interest (covars in mnlm or dmr). In particular, for counts \(C\), with row sums \(m\), and mnlm/dmr coefficients \(\phi_j\) corresponding to attribute \(j\), \(z_j = C'\phi_j/m\) is the SR projection in the direction of \(j\). The MNIR paper explains how \(V=[v_1 ... v_K]\), your original covariates/attributes, are independent of text counts \(C\) given SR projections \(Z=[z_1 ... z_K]\).

The final step of MNIR is `forward regression' for any element of \(V\) onto \(Z\) and the remaining elements of \(V\). We do not provide a function for this because you are free to use whatever you want; see the MNIR and DMR papers for linear, logistic, and random forest forward regression examples.

Note that if you were previously using textir not for inverse regression, but rather just as fast code for multinomial logistic regression, you probably want to work directly with the gamlr (binary response) or dmr (multinomial response) packages.

References

Taddy (2013, JASA), Multinomial Inverse Regression for Text Analysis (MNIR).

Taddy (2015, AoAS), Distributed Multinomial Regression (DMR).

Taddy (2016, JCGS), The Gamma Lasso (GL).

See Also

congress109, we8there, dmr

Examples

Run this code
# NOT RUN {
### Ripley's Cushing Data; see help(Cushings) ###
library(MASS)
data(Cushings)
Cushings[,1:2] <- log(Cushings[,1:2])
train <- Cushings[Cushings$Type!="u",]
newdata <- as.matrix(Cushings[Cushings$Type == "u", 1:2])

## fit, coefficients, predict, and plot

# you could replace 'mnlm' with 'dmr' here.
fit <- mnlm(NULL, 
  covars=train[,1:2], 
  counts=factor(train$Type))

## dmr applies corrected AICc selection by default
round(coef(fit),1) 
round(predict(fit, newdata, type="response"),1)
par(mfrow=c(1,3))
for(j in c("a","b","c")){ 
  plot(fit[[j]]); mtext(j,line=2) }

## see we8there and congress109 for MNIR and srproj examples
 
# }

Run the code above in your browser using DataLab