reglogit: Gibbs sampling for regularized logistic regression

Description

Regularized (multinomial) logistic regression by Gibbs sampling implementing subtly different MCMC schemes with varying efficiency depending on the data type (binary v. binomial, say) and the desired estimator (regularized maximum likelihood, or Bayesian maximum a posteriori/posterior mean, etc.) through a unified interface.

Usage

reglogit(T, y, X, N = NULL, flatten = FALSE, sigma = 1, nu = 1,
      kappa = 1, icept = TRUE, normalize = TRUE, zzero = TRUE, 
      powerprior = TRUE, kmax = 442, bstart = NULL, lt = NULL, 
      nup = list(a = 2, b = 0.1), save.latents = FALSE, verb = 100)
regmlogit(T, y, X, flatten = FALSE, sigma = 1, nu = 1, kappa = 1, 
      icept=TRUE, normalize = TRUE, zzero = TRUE, powerprior = TRUE, 
      kmax = 442, bstart = NULL, lt = NULL, nup = list(a=2, b=0.1),
      save.latents = FALSE, verb=100)

Value

The output is a list object of type "reglogit" or "regmlogit" containing a subset of the following fields; for "regmlogit" everyhing is expanded by one dimension into an array or matrix as appropriate.

X: the input design matrix, possible adjusted by normalization or intercept
y: the input response variable
beta: a matrix of T sampled regression coefficients on the original input scale
z: if zzero = FALSE a matrix of latent variables for the hierarchical cdf representation of the likelihood
lambda: a matrix of latent variables for the hierarchical (cdf or pdf) representation of the likelihood
lpost: a vector of log posterior probabilities of the parameters
map: the list containing the maximum a' posterior parameters; out$map$beta is on the original scale of the data
kappa: the input multiplicity parameter
omega: a matrix of latent variables for the regularization prior

Arguments

T: a positive integer scalar specifying the number of MCMC rounds
y: reglogit requires logical classification labels for Bernoulli data, or countsfor Binomial data; for the latter, N must also be specified. regmlogit requires positive integer class labeels in 1:C where C is the number of classes.
X: a design matrix of predictors; can be a typical (dense) matrix or a sparse Matrix object. When the design matrix is sparse (and is stored sparsely), this can produce a ~3x-faster execution via a more efficient update for the beta parameter. But when it is not sparse (but is stored sparsely) the execution could be much slower
N: an optional integer vector of total numbers of replicate trials for each X-y, i.e., for Binomial data instead of Bernoulli
flatten: a scalar logical that is only specified for Binomial data. It indicates if pre-processing code should flatten the Binomial likelihood into a Bernoulli likelihood
sigma: weights on the regression coefficients in the lasso penalty. The default of 1 is sensible when normalize = TRUE since then the estimator for beta is equivariant under rescaling
nu: a non-negative scalar indicating the initial value of the penalty parameter
kappa: a positive scalar specifying the multiplicity; kappa = 1 provides samples from the Bayesian posterior distribution. Larger values of kappa facilitates a simulated annealing approach to obtaining a regularized point estimator
icept: a scalar logical indicating if an (implicit) intercept should be included in the model
normalize: a scalar logical which, if TRUE, causes each variable is standardized to have unit L2-norm, otherwise it is left alone
zzero: a scalar logical indicating if the latent z variables to be sampled. Therefore this indicator specifies if the cdf representation (zzero = FALSE) or pdf representation (otherwise) should be used
powerprior: a scalar logical indicating if the prior should be powered up with multiplicity parameter kappa as well as the likelihood
kmax: a positive integer indicating the number replacing infinity in the sum for mixing density in the generative expression for lambda
bstart: an optional vector of length p = ncol(X) specifying initial values for the regression coefficients beta. Otherwise standard normal deviates are used
lt: an optional vector of length n = nrow(X) of initial values for the lambda latent variables. Otherwise a vector of ones is used.
nup: prior parameters =list(a, b) for the inverse Gamma distribution prior for nu, or NULL, which causes nu to be fixed
save.latents: a scalar logical indicating wether or not a trace of latent z, lambda and omega values should be saved for each iteration. Specify save.latents=TRUE for very large X in order to reduce memory swapping on low-RAM machines
verb: A positive integer indicating the number of MCMC rounds after which a progress statement is printed. Giving verb = 0 causes no statements to be printed

Author

Robert B. Gramacy rbg@vt.edu

Details

These are the main functions in the package. They support an omnibus framework for simulation-based regularized logistic regression. The default arguments invoke a Gibbs sampling algorithm to sample from the posterior distribution of a logistic regression model with lasso-type (double-exponential) priors. See the paper by Gramacy & Polson (2012) for details. Both cdf and pdf implementations are provided, which use slightly different latent variable representations, resulting in slightly different Gibbs samplers. These methods extend the un-regularized methods of Holmes & Held (2006)

The kappa parameter facilitates simulated annealing (SA) implementations in order to help find the MAP, and other point estimators. The actual SA algorithm is not provided in the package. However, it is easy to string calls to this function, using the outputs from one call as inputs to another, in order to establish a SA schedule for increasing kappa values.

The regmlogit function is a wrapper around the Gibbs sampler inside reglogit, invoking C-1 linked chains for C classes, extending the polychotomous regression scheme outlined by Holmes & Held (2006). For an example with regmlogit, see predict.regmlogit

References

R.B. Gramacy, N.G. Polson. “Simulation-based regularized logistic regression”. (2012) Bayesian Analysis, 7(3), p567-590; arXiv:1005.3430; https://arxiv.org/abs/1005.3430

C. Holmes, K. Held (2006). “Bayesian Auxilliary Variable Models for Binary and Multinomial Regression”. Bayesian Analysis, 1(1), p145-168.

Examples

Run this code

## load in the pima indian data
data(pima)
X <- as.matrix(pima[,-9])
y <- as.numeric(pima[,9])

## pre-normalize to match the comparison in the paper
one <- rep(1, nrow(X))
normx <- sqrt(drop(one %*% (X^2)))
X <- scale(X, FALSE, normx)

## compare to the GLM fit
fit.logit <- glm(y~X, family=binomial(link="logit"))
bstart <- fit.logit$coef

## do the Gibbs sampling
T <- 300 ## set low for CRAN checks; increase to >= 1000 for better results
out6 <- reglogit(T, y, X, nu=6, nup=NULL, bstart=bstart, normalize=FALSE)

## plot the posterior distribution of the coefficients
burnin <- (1:(T/10)) 
boxplot(out6$beta[-burnin,], main="nu=6, kappa=1", ylab="posterior",
        xlab="coefficients", bty="n", names=c("mu", paste("b", 1:8, sep="")))
abline(h=0, lty=2)

## add in GLM fit and MAP with legend
points(bstart, col=2, pch=17)
points(out6$map$beta, pch=19, col=3)
legend("topright", c("MLE", "MAP"), col=2:3, pch=c(17,19))

## simple prediction
p6 <- predict(out6, XX=X)
## hit rate
mean(p6$c == y)

##
## for a polychotomous example, with prediction, 
## see ? predict.regmlogit
##

if (FALSE) {
## now with kappa=10
out10 <- reglogit(T, y, X, kappa=10, nu=6, nup=NULL, bstart=bstart, 
                            normalize=FALSE)

## plot the posterior distribution of the coefficients
par(mfrow=c(1,2))
boxplot(out6$beta[-burnin,], main="nu=6, kappa=1",  ylab="posterior",
        xlab="coefficients", bty="n",  names=c("mu", paste("b", 1:8, sep="")))
abline(h=0, lty=2) 
points(bstart, col=2, pch=17)
points(out6$map$beta, pch=19, col=3)
legend("topright", c("MLE", "MAP"), col=2:3, pch=c(17,19))
boxplot(out10$beta[-burnin,], main="nu=6, kappa=10",  ylab="posterior",
        xlab="coefficients", bty="n",  names=c("mu", paste("b", 1:8, sep="")))
abline(h=0, lty=2)
## add in GLM fit and MAP with legend
points(bstart, col=2, pch=17)
points(out10$map$beta, pch=19, col=3)
legend("topright", c("MLE", "MAP"), col=2:3, pch=c(17,19))
}

##
## now some binomial data
##

if (FALSE) {
## synthetic data generation
library(boot)
N <- rep(20, 100)
beta <- c(2, -3, 2, -4, 0, 0, 0, 0, 0)
X <- matrix(runif(length(N)*length(beta)), ncol=length(beta))
eta <- drop(1 + X %*% beta)
p <- inv.logit(eta)
y <- rbinom(length(N), N, p)

## run the Gibbs sampler for the logit -- uses the fast Binomial
## version; for a comparison, try flatten=FALSE
out <- reglogit(T, y, X, N)

## plot the posterior distribution of the coefficients
boxplot(out$beta[-burnin,], main="binomial data",  ylab="posterior", 
       xlab="coefficients", bty="n",
       names=c("mu", paste("b", 1:ncol(X), sep="")))
abline(h=0, lty=2)

## add in GLM fit, the MAP fit, the truth, and a legend
fit.logit <- glm(y/N~X, family=binomial(link="logit"), weights=N)
points(fit.logit$coef, col=2, pch=17)
points(c(1, beta), col=4, pch=16)
points(out$map$beta, pch=19, col=3)
legend("topright", c("MLE", "MAP", "truth"), col=2:4, pch=c(17,19,16))

## also try specifying a larger kappa value to pin down the MAP
}

Run the code above in your browser using DataLab