Learn R Programming

quanteda (version 0.7.2-1)

clean: simple cleaning of text before processing

Usage

clean(x, ...)

## S3 method for class 'character': clean(x, removeDigits = TRUE, removePunct = TRUE, toLower = TRUE, removeAdditional = NULL, removeTwitter = FALSE, removeURL = TRUE, ...)

## S3 method for class 'corpus': clean(x, removeDigits = TRUE, removePunct = TRUE, toLower = TRUE, removeAdditional = NULL, removeTwitter = FALSE, ...)

cleanC(x, removeDigits = TRUE, removePunct = TRUE, toLower = TRUE, removeAdditional = NULL, removeTwitter = FALSE, removeURL = TRUE, ...)

Arguments

x
The object to be cleaned. Can be either a character vector or a corpus object. If x is a corpus, clean returns the corpus containing the cleaned texts.
...
additional parameters
removeDigits
remove numbers if TRUE
removePunct
remove punctuation if TRUE
toLower
convert text to lower case TRUE
removeAdditional
additional characters to remove (regular expression)
removeTwitter
if FALSE, do not remove @ or #}}

removeURL{removes URLs (web addresses starting with http: or https:), based on a regular expression from http://daringfireball.net/2010/07/improv