A slightly filtered dataset containing Dias's sentence boundary disambiguation edge cases. This is a nested data set with the outcome column as a nested list of desired splits. The non-ASCII cases and spaced ellipsis examples have been removed.
data(golden_rules)
A data frame with 45 rows and 3 variables
Rule. The name of the rule to test
Text. The testing text
Outcome. The desired outcome of the sentence disambiguation
Dias, Kevin S. 2015. Golden Rules (English). Retrieved: https://s3.amazonaws.com/tm-town-nlp-resources/golden_rules.txt