Tokenization first replaces the elements of x by their Unicode
  character sequences.  Then, the non-alphabetic characters (i.e., the
  ones which do not have the Alphabetic property) are replaced by
  blanks, and the corresponding strings are split according to the
  blanks.