Learn R Programming

kernlab (version 0.9-33)

ticdata: The Insurance Company Data

Description

This data set used in the CoIL 2000 Challenge contains information on customers of an insurance company. The data consists of 86 variables and includes product usage data and socio-demographic data derived from zip area codes. The data was collected to answer the following question: Can you predict who would be interested in buying a caravan insurance policy and give an explanation why ?

Usage

data(ticdata)

Arguments

Format

ticdata: Dataset to train and validate prediction models and build a description (9822 customer records). Each record consists of 86 attributes, containing sociodemographic data (attribute 1-43) and product ownership (attributes 44-86). The sociodemographic data is derived from zip codes. All customers living in areas with the same zip code have the same sociodemographic attributes. Attribute 86, CARAVAN:Number of mobile home policies, is the target variable.

Data Format

1STYPECustomer Subtype
2MAANTHUINumber of houses 1 - 10
3MGEMOMVAvg size household 1 - 6
4MGEMLEEFAverage age
5MOSHOOFDCustomer main type
6MGODRKRoman catholic
7MGODPRProtestant ...
8MGODOVOther religion
9MGODGENo religion
10MRELGEMarried
11MRELSALiving together
12MRELOVOther relation
13MFALLEENSingles
14MFGEKINDHousehold without children
15MFWEKINDHousehold with children
16MOPLHOOGHigh level education
17MOPLMIDDMedium level education
18MOPLLAAGLower level education
19MBERHOOGHigh status
20MBERZELFEntrepreneur
21MBERBOERFarmer
22MBERMIDDMiddle management
23MBERARBGSkilled labourers
24MBERARBOUnskilled labourers
25MSKASocial class A
26MSKB1Social class B1
27MSKB2Social class B2
28MSKCSocial class C
29MSKDSocial class D
30MHHUURRented house
31MHKOOPHome owners
32MAUT11 car
33MAUT22 cars
34MAUT0No car
35MZFONDSNational Health Service
36MZPARTPrivate health insurance
37MINKM30Income >30.000
38MINK3045Income 30-45.000
39MINK4575Income 45-75.000
40MINK7512Income 75-122.000
41MINK123MIncome <123.000
42MINKGEMAverage income
43MKOOPKLAPurchasing power class
44PWAPARTContribution private third party insurance
45PWABEDRContribution third party insurance (firms)
46PWALANDContribution third party insurance (agriculture)
47PPERSAUTContribution car policies
48PBESAUTContribution delivery van policies
49PMOTSCOContribution motorcycle/scooter policies
50PVRAAUTContribution lorry policies
51PAANHANGContribution trailer policies
52PTRACTORContribution tractor policies
53PWERKTContribution agricultural machines policies
54PBROMContribution moped policies
55PLEVENContribution life insurances
56PPERSONGContribution private accident insurance policies
57PGEZONGContribution family accidents insurance policies
58PWAOREGContribution disability insurance policies
59PBRANDContribution fire policies
60PZEILPLContribution surfboard policies
61PPLEZIERContribution boat policies
62PFIETSContribution bicycle policies
63PINBOEDContribution property insurance policies
64PBYSTANDContribution social security insurance policies
65AWAPARTNumber of private third party insurance 1 - 12
66AWABEDRNumber of third party insurance (firms) ...
67AWALANDNumber of third party insurance (agriculture)
68APERSAUTNumber of car policies
69ABESAUTNumber of delivery van policies
70AMOTSCONumber of motorcycle/scooter policies
71AVRAAUTNumber of lorry policies
72AAANHANGNumber of trailer policies
73ATRACTORNumber of tractor policies
74AWERKTNumber of agricultural machines policies
75ABROMNumber of moped policies
76ALEVENNumber of life insurances
77APERSONGNumber of private accident insurance policies
78AGEZONGNumber of family accidents insurance policies
79AWAOREGNumber of disability insurance policies
80ABRANDNumber of fire policies
81AZEILPLNumber of surfboard policies
82APLEZIERNumber of boat policies
83AFIETSNumber of bicycle policies
84AINBOEDNumber of property insurance policies
85ABYSTANDNumber of social security insurance policies
86CARAVANNumber of mobile home policies 0 - 1

Note: All the variables starting with M are zipcode variables. They give information on the distribution of that variable, e.g., Rented house, in the zipcode area of the customer.

Details

Information about the insurance company customers consists of 86 variables and includes product usage data and socio-demographic data derived from zip area codes. The data was supplied by the Dutch data mining company Sentient Machine Research and is based on a real world business problem. The training set contains over 5000 descriptions of customers, including the information of whether or not they have a caravan insurance policy. The test set contains 4000 customers. The test and data set are merged in the ticdata set. More information about the data set and the CoIL 2000 Challenge along with publications based on the data set can be found at http://www.liacs.nl/~putten/library/cc2000/.

References

Peter van der Putten, Michel de Ruiter, Maarten van Someren CoIL Challenge 2000 Tasks and Results: Predicting and Explaining Caravan Policy Ownership
http://www.liacs.nl/~putten/library/cc2000/