This data comes from Chakraborty et. al., which combines headlines from
a variety of news and clickbait sources. Some headlines contain
subject matter inappropriate for classroom use. Given the volume of headlines
containing such language (especially for clickbait == TRUE
), this filtering
might not catch all problematic headlines. User discretion is advised.
The training dataset is a random sample of approximately 80% of the observations
from the original dataset.
The testing dataset is a random sample of the remaining 20% of the observations
not found in the training set.