cleansing_corpus: Cleansing Corpus
Description
The function performs text cleansing by removing escape characters, non alphanumeric,
long-words, excess space, and turns all letters to lower case.
Usage
cleansing_corpus(
text,
escape_chars = TRUE,
nonalphanum = TRUE,
longwords = TRUE,
whitespace = TRUE,
tolower = TRUE
)
Arguments
text
Character vector of free text to be cleansed.
escape_chars
If TRUE, removes escape characters for slash n
, slash r
and slash t
.
nonalphanum
If TRUE, removes non-alphanumeric characters.
longwords
If TRUE, removes words with more than 35 characters.
whitespace
If TRUE, removes excess whitespace.
tolower
If TRUE, turns letters to lower.
Value
A character vector of the cleansed text.
Examples
Run this code# NOT RUN {
txt <- "It has roots in a piece of classical Latin literature from 45 BC"
cleansing_corpus(txt)
# }
Run the code above in your browser using DataLab