Learn R Programming

HDtweedie (version 1.2)

cv.HDtweedie: Cross-validation for HDtweedie

Description

Does k-fold cross-validation for HDtweedie, produces a plot, and returns a value for lambda. This function is modified based on the cv function from the glmnet package.

Usage

cv.HDtweedie(x, y, group = NULL, p, weights, lambda = NULL, 
	pred.loss = c("deviance", "mae", "mse"), 
	nfolds = 5, foldid, ...)

Arguments

x

matrix of predictors, of dimension \(n \times p\); each row is an observation vector.

y

response variable. This argument should be non-negative.

group

To apply the grouped lasso, it is a vector of consecutive integers describing the grouping of the coefficients (see example below). To apply the lasso, the user can ignore this argument, and the vector is automatically generated by treating each variable as a group.

p

the power used for variance-mean relation of Tweedie model. Default is 1.50.

weights

the observation weights. Default is equal weight.

lambda

optional user-supplied lambda sequence; default is NULL, and HDtweedie chooses its own sequence.

pred.loss

loss to use for cross-validation error. Valid options are:

  • "deviance" Deviance.

  • "mae" Mean absolute error.

  • "mse" Mean square error.

Default is "deviance".

nfolds

number of folds - default is 5. Although nfolds can be as large as the sample size (leave-one-out CV), it is not recommended for large datasets. Smallest value allowable is nfolds=3.

foldid

an optional vector of values between 1 and nfold identifying what fold each observation is in. If supplied, nfold can be missing.

other arguments that can be passed to HDtweedie.

Value

an object of class cv.HDtweedie is returned, which is a list with the ingredients of the cross-validation fit.

lambda

the values of lambda used in the fits.

cvm

the mean cross-validated error - a vector of length length(lambda).

cvsd

estimate of standard error of cvm.

cvupper

upper curve = cvm+cvsd.

cvlower

lower curve = cvm-cvsd.

name

a text string indicating type of measure (for plotting purposes).

HDtweedie.fit

a fitted HDtweedie object for the full data.

lambda.min

The optimal value of lambda that gives minimum cross validation error cvm.

lambda.1se

The largest value of lambda such that error is within 1 standard error of the minimum.

Details

The function runs HDtweedie nfolds+1 times; the first to get the lambda sequence, and then the remainder to compute the fit with each of the folds omitted. The average error and standard deviation over the folds are computed.

References

Qian, W., Yang, Y., Yang, Y. and Zou, H. (2016), ``Tweedie's Compound Poisson Model With Grouped Elastic Net,'' Journal of Computational and Graphical Statistics, 25, 606-625.

See Also

HDtweedie, plot.cv.HDtweedie, predict.cv.HDtweedie, and coef.cv.HDtweedie methods.

Examples

Run this code
# NOT RUN {
# load HDtweedie library
library(HDtweedie)

# load data set
data(auto)

# 5-fold cross validation using the lasso
cv0 <- cv.HDtweedie(x=auto$x,y=auto$y,p=1.5,nfolds=5) 

# define group index
group1 <- c(rep(1,5),rep(2,7),rep(3,4),rep(4:14,each=3),15:21)

# 5-fold cross validation using the grouped lasso 
cv1 <- cv.HDtweedie(x=auto$x,y=auto$y,group=group1,p=1.5,nfolds=5)
# }

Run the code above in your browser using DataLab