auto: A motor insurance dataset

Description

The motor insurance dataset is originially retrieved from the SAS Enterprise Miner database. The included dataset is generated by re-organization and transformation as described in Qian et al. (2016).

Usage

data(auto)

Arguments

Value

A list with the following elements:

a [2812 x 56] matrix giving 2812 policy records with 56 predictors

the aggregate claim loss

Details

This data set contains 2812 policy samples with 56 predictors. See Qian et al. (2016) for a detailed description of the generation of these predictors. The response is the aggregate claim loss (in thousand dollars). The predictors are expanded from the following original variables:

CAR_TYPE:: car type, 6 categories
JOBCLASS:: job class, 8 categories
MAX_EDUC:: education level, 5 categories
KIDSDRIV:: number of children passengers
TRAVTIME:: time to travel from home to work
BLUEBOOK:: car value
NPOLICY:: number of policies
MVR_PTS:: motor vehicle record point
AGE:: driver age
HOMEKIDS:: number of children at home
YOJ:: years on job
INCOME:: income
HOME_VAL:: home value
SAMEHOME:: years in current address
CAR_USE:: whether the car is for commercial use
RED_CAR:: whether the car color is red
REVOLKED:: whether the driver's license was revoked in the past
GENDER:: gender
MARRIED:: whether married
PARENT1:: whether a single parent
AREA:: whether the driver lives in urban area

References

Yip, K. C. H. and Yau, K. K. W. (2005), ``On Modeling Claim Frequency Data In General Insurance With Extra Zeros'', Insurance: Mathematics and Economics, 36, 153-163.

Zhang, Y (2013). ``cplm: Compound Poisson Linear Models''. A vignette for R package cplm. Available from https://CRAN.R-project.org/package=cplm

Qian, W., Yang, Y., Yang, Y. and Zou, H. (2016), ``Tweedie's Compound Poisson Model With Grouped Elastic Net,'' Journal of Computational and Graphical Statistics, 25, 606-625.

Examples

Run this code

# NOT RUN {
# load HDtweedie library
library(HDtweedie)

# load data set
data(auto)

# how many samples and how many predictors ?
dim(auto$x)

# repsonse y
auto$y
# }

Run the code above in your browser using DataLab