some emojis, combining characters and modifiers (e.g., skin tones)
are not recognised properly
[fixed here]
what is considered a word does not depend on locale
[fixed here - using ICU's word break iterators]
multiple whitespaces between words are not preserved except after
a dot, question mark, or exclamation mark,
which leads to two spaces inserted
[changed here -- any sequence of whitespaces considered
word boundaries is converted to a single space]
a greedy word wrap algorithm is used, which may lead to high
raggedness
[fixed here -- using the Knuth-Plass method]