Preprocess text data such as names and addresses.
preprocText(text, convert_text, tolower, soundex,
usps_address, remove_whitespace, remove_punctuation, convert_text_to)
preprocText()
returns the preprocessed vector of text.
A vector of text data to convert.
Whether to convert text to the desired encoding, where the encoding is specified in the 'convert_text_to' argument. Default is TRUE
Whether to normalize the text to be all lowercase. Default is TRUE.
Whether to convert the field to the Census's soundex encoding. Default is FALSE.
Whether to use USPS address standardization rules to clean address fields. Default is FALSE.
Whether to remove leading and trailing whitespace, and to convert multiple spaces to a single space. Default is TRUE.
Whether to remove punctuation from a string. Default is TRUE.
Which encoding to use when converting text. Default is 'Latin-ASCII'.
Full list of encodings in the stri_trans_list()
function in the stringi
package.
Ben Fifield <benfifield@gmail.com>