Learn R Programming

lares (version 5.0.3)

cleanText: Clean text

Description

This function lets the user clean text into getting only alphanumeric characters and no accents/symbols on letters.

Usage

cleanText(text, spaces = TRUE, lower = TRUE, ascii = TRUE, title = FALSE)

Arguments

text

Character Vector

spaces

Boolean. Keep spaces? If character input, spaces will be transformed into passed argument.

lower

Boolean. Transform all to lower case?

ascii

Boolean. Only ASCII characters?

title

Boolean. Transform to title format (upper case on first letters)

Value

Character vector with transformed strings.

See Also

Other Data Wrangling: balance_data(), categ_reducer(), date_cuts(), date_feats(), formatNum(), holidays(), impute(), left(), normalize(), numericalonly(), ohe_commas(), ohse(), removenacols(), removenarows(), replaceall(), textFeats(), textTokenizer(), vector2text(), year_month(), year_week()

Other Text Mining: cleanNames(), ngrams(), remove_stopwords(), replaceall(), sentimentBreakdown(), textCloud(), textFeats(), textTokenizer(), topics_rake()

Examples

Run this code
# NOT RUN {
cleanText("Bernardo Lares 123")
cleanText("B<U+00E8>rn<U+00E4>rdo L<U+00E1>reS 123", lower = FALSE)
cleanText("Bernardo Lare$", spaces = ".", ascii = FALSE)
cleanText("\\@<U+00AE><U+00EC><U+00F7><U+00E5>   %<U+00F1>S  ..-X", spaces = FALSE)
cleanText(c("mar<U+00ED>a", "<U+20AC>", "n<U+00FA><U+00F1>ez_a."), title = TRUE)
# }

Run the code above in your browser using DataLab