Learn R Programming

lares (version 5.1.4)

cleanText: Clean text

Description

This function lets the user clean text into getting only alphanumeric characters and no accents/symbols on letters.

Resulting names are unique and consist only of the _ character, numbers, and ASCII letters. Capitalization preferences can be specified using the lower parameter. Inspired by janitor::clean_names.

Usage

cleanText(text, spaces = TRUE, lower = TRUE, ascii = TRUE, title = FALSE)

cleanNames(df, num = "x", ...)

Value

Character vector with transformed strings.

data.frame/tibble with transformed column names.

Arguments

text

Character Vector

spaces

Boolean. Keep spaces? If character input, spaces will be transformed into passed argument.

lower

Boolean. Transform all to lower case?

ascii

Boolean. Only ASCII characters?

title

Boolean. Transform to title format (upper case on first letters)

df

data.frame/tibble.

num

Add character before only-numeric names.

...

Additional parameters passed to cleanText().

See Also

Other Data Wrangling: balance_data(), categ_reducer(), date_cuts(), date_feats(), formatNum(), holidays(), impute(), left(), normalize(), ohe_commas(), ohse(), removenacols(), replaceall(), textFeats(), textTokenizer(), vector2text(), year_month()

Other Text Mining: ngrams(), remove_stopwords(), replaceall(), sentimentBreakdown(), textCloud(), textFeats(), textTokenizer(), topics_rake()

Examples

Run this code
cleanText("Bernardo Lares 123")
cleanText("Bèrnärdo LáreS 123", lower = FALSE)
cleanText("Bernardo Lare$", spaces = ".", ascii = FALSE)
cleanText("\\@®ì÷å   %ñS  ..-X", spaces = FALSE)
cleanText(c("maría", "€", "núñez_a."), title = TRUE)
df <- dft[1:5, 1:6] # Dummy data
colnames(df) <- c("ID.", "34", "x_2", "Num 123", "Nòn-äscì", "  white   Spaces  ")
print(df)
cleanNames(df)
cleanNames(df, lower = FALSE)

Run the code above in your browser using DataLab