Learn R Programming

lares (version 5.1.0)

cleanText: Clean text

Description

This function lets the user clean text into getting only alphanumeric characters and no accents/symbols on letters.

Resulting names are unique and consist only of the _ character, numbers, and ASCII letters. Capitalization preferences can be specified using the lower parameter. Inspired by janitor::clean_names.

Usage

cleanText(text, spaces = TRUE, lower = TRUE, ascii = TRUE, title = FALSE)

cleanNames(df, num = "x", ...)

Arguments

text

Character Vector

spaces

Boolean. Keep spaces? If character input, spaces will be transformed into passed argument.

lower

Boolean. Transform all to lower case?

ascii

Boolean. Only ASCII characters?

title

Boolean. Transform to title format (upper case on first letters)

df

data.frame/tibble.

num

Add character before only-numeric names.

...

Additional parameters passed to cleanText().

Value

Character vector with transformed strings.

data.frame/tibble with transformed column names.

See Also

Other Data Wrangling: balance_data(), categ_reducer(), date_cuts(), date_feats(), formatNum(), holidays(), impute(), left(), normalize(), ohe_commas(), ohse(), removenacols(), replaceall(), textFeats(), textTokenizer(), vector2text(), year_month()

Other Text Mining: ngrams(), remove_stopwords(), replaceall(), sentimentBreakdown(), textCloud(), textFeats(), textTokenizer(), topics_rake()

Examples

Run this code
# NOT RUN {
cleanText("Bernardo Lares 123")
cleanText("B<U+00E8>rn<U+00E4>rdo L<U+00E1>reS 123", lower = FALSE)
cleanText("Bernardo Lare$", spaces = ".", ascii = FALSE)
cleanText("\\@<U+00AE><U+00EC><U+00F7><U+00E5>   %<U+00F1>S  ..-X", spaces = FALSE)
cleanText(c("mar<U+00ED>a", "<U+20AC>", "n<U+00FA><U+00F1>ez_a."), title = TRUE)
df <- dft[1:5, 1:6] # Dummy data
colnames(df) <- c("ID.", "34", "x_2", "Num 123", "N<U+00F2>n-<U+00E4>sc<U+00EC>", "  white   Spaces  ")
print(df)
cleanNames(df)
cleanNames(df, lower = FALSE)
# }

Run the code above in your browser using DataLab