Artificial data that can be used for unit-testing or teaching
create_data_buy(
obs = 1000,
target_name = "buy",
factorise_target = FALSE,
target1_prob = 0.5,
add_extreme = TRUE,
flip_gender = FALSE,
add_id = FALSE,
seed = 123
)
A dataset as tibble
Number of observations
Variable name of target
Should target variable be factorised? (from 0/1 to factor no/yes)?
Probability that target = 1
Add an observation with extreme values?
Should Male/Female be flipped in data?
Add an id-variable to data?
Seed for randomization
Variables in dataset:
id = Identifier
period = Year & Month (YYYYMM)
city_ind = Indicating if customer is residing in a city (1 = yes, 0 = no)
female_ind = Gender of customer is female (1 = yes, 0 = no)
fixedvoice_ind = Customer has a fixed voice product (1 = yes, 0 = no)
fixeddata_ind = Customer has a fixed data product (1 = yes, 0 = no)
fixedtv_ind = Customer has a fixed TV product (1 = yes, 0 = no)
mobilevoice_ind = Customer has a mobile voice product (1 = yes, 0 = no)
mobiledata_prd = Customer has a mobile data product (NO/MOBILE STICK/BUSINESS)
bbi_speed_ind = Customer has a Broadband Internet (BBI) with extra speed
bbi_usg_gb = Broadband Internet (BBI) usage in Gigabyte (GB) last month
hh_single = Expected to be a Single Household (1 = yes, 0 = no)
Target in dataset:
buy (may be renamed) = Did customer buy a new product in next month? (1 = yes, 0 = no)