HDtweedie: Fits the regularization paths for lasso-type methods of the Tweedie model

Description

Fits regularization paths for lasso-type methods of the Tweedie model at a sequence of regularization parameters lambda.

Usage

HDtweedie(x, y, group = NULL, 
		p = 1.50,
		weights = rep(1,nobs),
		alpha = 1,
		nlambda = 100, 
		lambda.factor = ifelse(nobs < nvars, 0.05, 0.001), 
		lambda = NULL, 
		pf = sqrt(bs), 
		dfmax = as.integer(max(group)) + 1, 
		pmax = min(dfmax * 1.2, as.integer(max(group))), 
		standardize = FALSE,
		eps = 1e-08, maxit = 3e+08)

Arguments

matrix of predictors, of dimension $n \times p$; each row is an observation vector.

response variable. This argument should be non-negative.

group

To apply the grouped lasso, it is a vector of consecutive integers describing the grouping of the coefficients (see example below). To apply the lasso, the user can ignore this argument, and the vector is automatically generated by treating each variable as a group.

the power used for variance-mean relation of Tweedie model. Default is 1.50.

weights

the observation weights. Default is equal weight.

alpha

The elasticnet mixing parameter, with $0\le\alpha\le 1$. The penalty is defined as $$(1-\alpha)/2||\beta||_2^2+\alpha||\beta||_1.$$ alpha=1 is the lasso penalty, and alpha=0 the ridge penalty. Default is 1.

nlambda

the number of lambda values - default is 100.

lambda.factor

the factor for getting the minimal lambda in lambda sequence, where min(lambda) = lambda.factor * max(lambda). max(lambda) is the smallest value of lambda for which all coefficients are zero. The default depends on the relationship between $n$ (the number of rows in the matrix of predictors) and $p$ (the number of predictors). If $n >= p$, the default is 0.001, close to zero. If $n<p$, the default is 0.05. A very small value of lambda.factor will lead to a saturated fit. It takes no effect if there is user-defined lambda sequence.

lambda

a user supplied lambda sequence. Typically, by leaving this option unspecified users can have the program compute its own lambda sequence based on nlambda and lambda.factor. Supplying a value of lambda overrides this. It is better to supply a decreasing sequence of lambda values than a single (small) value. If not, the program will sort user-defined lambda sequence in decreasing order automatically.

penalty factor, a vector in length of bn (bn is the total number of groups). Separate penalty weights can be applied to each group to allow differential shrinkage. Can be 0 for some groups, which implies no shrinkage, and results in that group always being included in the model. Default value for each entry is the square-root of the corresponding size of each group (for the lasso, it is 1 for each variable).

dfmax

limit the maximum number of groups in the model. Default is bs+1.

pmax

limit the maximum number of groups ever to be nonzero. For example once a group enters the model, no matter how many times it exits or re-enters model through the path, it will be counted only once. Default is min(dfmax*1.2,bs).

eps

convergence termination tolerance. Defaults value is 1e-8.

standardize

logical flag for variable standardization, prior to fitting the model sequence. If TRUE, x matrix is normalized such that each column is centered and sum squares of each column $\sum^N_{i=1}x_{ij}^2/N=1$. The coefficients are always returned on the original scale. Default is FALSE.

maxit

maximum number of inner-layer BMD iterations allowed. Default is 3e8.

Value

An object with S3 class HDtweedie.

call

the call that produced this object

intercept sequence of length length(lambda)

beta

a p*length(lambda) matrix of coefficients.

the number of nonzero groups for each value of lambda.

dim

dimension of coefficient matrix (ices)

lambda

the actual sequence of lambda values used

npasses

total number of iterations (the most inner loop) summed over all lambda values

jerr

error flag, for warnings and errors, 0 if no error.

group

a vector of consecutive integers describing the grouping of the coefficients.

Details

The sequence of models implied by lambda is fit by the IRLS-BMD algorithm. This gives a (grouped) lasso or (grouped) elasticnet regularization path for fitting the Tweedie generalized linear regression paths, by maximizing the corresponding penalized Tweedie log-likelihood. If the group argument is ignored, the function fits the lasso. Users can tweak the penalty by choosing different $alpha$ and penalty factor.

For computing speed reason, if models are not converging or running slow, consider increasing eps, decreasing nlambda, or increasing lambda.factor before increasing maxit.

References

Qian, W., Yang, Y., Yang, Y. and Zou, H. (2016), ``Tweedie's Compound Poisson Model With Grouped Elastic Net,'' Journal of Computational and Graphical Statistics, 25, 606-625.

Examples

Run this code

# NOT RUN {
# load HDtweedie library
library(HDtweedie)

# load auto data set
data(auto)

# fit the lasso
m0 <- HDtweedie(x=auto$x,y=auto$y,p=1.5)

# define group index
group1 <- c(rep(1,5),rep(2,7),rep(3,4),rep(4:14,each=3),15:21)

# fit the grouped lasso
m1 <- HDtweedie(x=auto$x,y=auto$y,group=group1,p=1.5)

# fit the grouped elastic net
m2 <- HDtweedie(x=auto$x,y=auto$y,group=group1,p=1.5,alpha=0.7)
# }

Run the code above in your browser using DataLab