glmreg: fit a GLM with lasso (or elastic net), snet or mnet regularization

Description

Fit a generalized linear model via penalized maximum likelihood. The regularization path is computed for the lasso (or elastic net penalty), scad (or snet) and mcp (or mnet penalty), at a grid of values for the regularization parameter lambda. Fits linear, logistic, Poisson and negative binomial (fixed scale parameter) regression models.

Usage

# S3 method for formula
glmreg(formula, data, weights, offset=NULL, contrasts=NULL, 
x.keep=FALSE, y.keep=TRUE, ...)
# S3 method for matrix
glmreg(x, y, weights, offset=NULL, ...)
# S3 method for default
glmreg(x,  ...)

Value

An object with S3 class "glmreg" for the various types of models.

call: the call that produced this object
b0: Intercept sequence of length length(lambda)
beta: A nvars x length(lambda) matrix of coefficients.
lambda: The actual sequence of lambda values used
offset: the offset vector used.
resdev: The computed deviance (for "gaussian", this is the R-square). The deviance calculations incorporate weights if present in the model. The deviance is defined to be 2*(loglike_sat - loglike), where loglike_sat is the log-likelihood for the saturated model (a model with a free parameter per observation).
nulldev: Null deviance (per observation). This is defined to be 2*(loglike_sat -loglike(Null)); The NULL model refers to the intercept model.
nobs: number of observations
pll: penalized log-likelihood values for standardized coefficients in the IRLS iterations. For family="gaussian", not implemented yet.
pllres: penalized log-likelihood value for the estimated model on the original scale of coefficients
fitted.values: the fitted mean values, obtained by transforming the linear predictors by the inverse of the link function.

Arguments

formula: symbolic description of the model, see details.
data: argument controlling formula processing via model.frame.
weights: optional numeric vector of weights. If standardize=TRUE, weights are renormalized to weights/sum(weights). If standardize=FALSE, weights are kept as original input
offset: this can be used to specify an a priori known component to be included in the linear predictor during fitting. This should be NULL or a numeric vector of length equal to the number of cases. Currently only one offset term can be included in the formula.
x: input matrix, of dimension nobs x nvars; each row is an observation vector
y: response variable. Quantitative for family="gaussian". Non-negative counts for family="poisson" or family="negbin". For family="binomial" should be either a factor with two levels or a vector of proportions.
x.keep, y.keep: logical values: keep response variables or keep response variable?
contrasts: the contrasts corresponding to levels from the respective models
...: Other arguments passing to glmreg_fit

Author

Zhu Wang <zwang145@uthsc.edu>

Details

The sequence of models implied by lambda is fit by coordinate descent. For family="gaussian" this is the lasso, mcp or scad sequence if alpha=1, else it is the enet, mnet or snet sequence. For the other families, this is a lasso (mcp, scad) or elastic net (mnet, snet) regularization path for fitting the generalized linear regression paths, by maximizing the appropriate penalized log-likelihood. Note that the objective function for "gaussian" is $$1/2* weights*RSS + \lambda*penalty,$$ if standardize=FALSE and $$1/2* \frac{weights}{\sum(weights)}*RSS + \lambda*penalty,$$ if standardize=TRUE. For the other models it is $$-\sum (weights * loglik) + \lambda*penalty$$ if standardize=FALSE and $$-\frac{weights}{\sum(weights)} * loglik + \lambda*penalty$$ if standardize=TRUE.

References

Breheny, P. and Huang, J. (2011) Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection. Ann. Appl. Statist., 5: 232-253.

Zhu Wang, Shuangge Ma, Michael Zappitelli, Chirag Parikh, Ching-Yun Wang and Prasad Devarajan (2014) Penalized Count Data Regression with Application to Hospital Stay after Pediatric Cardiac Surgery, Statistical Methods in Medical Research. 2014 Apr 17. [Epub ahead of print]

Examples

Run this code

#binomial
x=matrix(rnorm(100*20),100,20)
g2=sample(0:1,100,replace=TRUE)
fit2=glmreg(x,g2,family="binomial")
#poisson and negative binomial
data("bioChemists", package = "pscl")
fm_pois <- glmreg(art ~ ., data = bioChemists, family = "poisson")
coef(fm_pois)
fm_nb1 <- glmreg(art ~ ., data = bioChemists, family = "negbin", theta=1)
coef(fm_nb1)
#offset
x <- matrix(rnorm(100*20),100,20)
y <- rpois(100, lambda=1)
exposure <- rep(0.5, length(y))
fit2 <- glmreg(x,y, lambda=NULL, nlambda=10, lambda.min.ratio=1e-4, 
	       offset=log(exposure), family="poisson")
predict(fit2, newx=x, newoffset=log(exposure))
if (FALSE) {
fm_nb2 <- glmregNB(art ~ ., data = bioChemists)
coef(fm_nb2)
}

Run the code above in your browser using DataLab