Learn R Programming

crqanlp (version 0.3)

clean_text: Clean text

Description

Pre-processing of raw text. It removes stop-words, punctuations, and create sentence markers.

Usage

clean_text(rawText,removeStopwords=F)

Arguments

rawText

A Vector of strings (tokens)

removeStopwords

A boolean: TRUE (remove stop words) - FALSE (it retains them)

Value

It returns the vector of text all in lower case, and stripped from punctuations and stop-words.

Details

A convenience function that removes unwanted information from a vector of text. The user has, at the moment, an argument to choose whether to remove stop words.

Examples

Run this code
# NOT RUN {
library(gutenbergr)
## let's get Alice's Adventures in Wonderland by Carroll
# gutenberg_works(author == "Carroll, Lewis") 
rawText = gutenberg_download(11) ## take the text
rawText = as.vector(rawText$text) ## vectorize the text
rawText = paste(rawText, collapse = " ") ## collapse the text

cleanText = clean_text(rawText, removeStopwords = TRUE)
text      = cleanText$content

# }

Run the code above in your browser using DataLab