Logistic Regression with lasso like penalties
RLR(X, Y, D, lambda, ...)
A list with components:
vector of coefficients
log likelihood value at the solution
return status from the Mosek optimizer
.
a design matrix for the unconstrained logistic regression model
a response vector of Boolean values, or n by 2 matrix of binomials as in glm
is a matrix specifying the penalty, diag(ncol(X))
for the conventional
lasso penalty
a scalar specifying the intensity of one's belief in the prior. No provision for automatic selection has been made (yet).
other parameters passed to control optimization: These may
include rtol
the relative tolerance for dual gap convergence criterion,
verb
to control verbosity desired from mosek, verb = 0
is quiet,
verb = 5
produces a fairly detailed iteration log. See the documentation for
KWDual
for further details.
Roger Koenker with crucial help from Michal Adamaszek of Mosek ApS
In some logistic regression problems, especially those with a large number of fixed effects
like the Bradley-Terry rating model, it may be plausible to consider groups of effects that
would be considered equivalence classes. One way to implement such prior information is to
impose some form of regularization penalty. In the general formulation we are trying to
solve the problem:
$$ \min \ell (\theta | X, y) + \| D \theta \|_1 $$.
For example in the Bradley-Terry rating model, we may consider penalties of the form,
$$ \| D \theta \|_1 = \sum_{i < j} |\theta_i - \theta_j | $$
so differences in all pairs of ratings are pulled together. This form of the penalty
has been used by Hocking et al (2011) for clustering, by Masarotto and Varin (2012)
for estimation of the Bradley Terry model and by Gu and Volgushev (2019) for grouping
fixed effects in panel data models. This is an implementation in
Mosek, so the package Rmosek and Mosek must be available at run time.
The demo(RLR1)
illustrates use with the conventional lasso penalty and produces a
lasso shrinkage plot. The demo(RLR2)
illustrates use with the ranking/grouping
lasso penalty and produces a plot of how the number of groups is reduced as lambda rises.
Gu, J. and Volgushev, S. (2019), `Panel data quantile regression with grouped fixed effects', Journal of Econometrics, 213, 68--91.
Hocking, T. D., Joulin, A., Bach, F. and Vert, J.-P. (2011), `Clusterpath: an algorithm for clustering using convex fusion penalties', Proceedings of the 28th International Conference on International Conference on Machine Learning, 745--752.
Masarotto, G. and Varin, C. (2012), `The ranking lasso and its application to sport tournaments', The Annals of Applied Statistics, 6, 1949--1970.