Learn R Programming

summarytools (version 1.0.1)

tobacco: Tobacco Use and Health - Simulated Dataset

Description

A simulated datasets of 1,000 subjects, with the following variables:

Usage

data(tobacco)

Arguments

Format

A data frame with 1000 rows and 9 variables

Details

  • gender Factor with 2 levels: “F” and “M”, having roughly 500 of each.

  • age Numerical.

  • age.gr Factor with 4 age categories.

  • BMI Body Mass Index (numerical).

  • smoker Factor (“Yes” / “No”).

  • cigs.per.day Number of cigarettes smoked per day (numerical).

  • diseased Factor (“Yes” / “No”).

  • disease Character.

  • samp.wgts Sampling weights (numerical).

A note on simulation: probability for an individual to fall into category “diseased” is based on an arbitrary function involving age, BMI and number of cigarettes per day.

A copy of this dataset is also available in French under the name “tabagisme”.