Learn R Programming

PubMedWordcloud (version 0.3.6)

cleanAbstracts: clean data

Description

remove Punctuations, remove Numbers, Translate characters to lower or upper case, remove stopwords, remove user specified words, Stemming words.

Usage

cleanAbstracts(abstracts, rmNum = TRUE, tolw = TRUE, toup = FALSE,
  rmWords = TRUE, yrWords = NULL, stemDoc = FALSE)

Arguments

abstracts

output of getAbstracts, or just a paragraph of text

rmNum

Remove the text document with any numbers in it or not

tolw

Translate characters in character vectors to lower case or not

toup

Translate characters in character vectors to upper case or not

rmWords

Remove a set of English stopwords (e.g., 'the') or not

yrWords

A character vector listing the words to be removed.

stemDoc

Stem words in a text document using Porter's stemming algorithm.

See Also

getAbstracts

Examples

Run this code
# NOT RUN {
# Abs=getAbstracts(c("22693232", "22564732"))
# cleanAbs=cleanAbstracts(Abs)

# text="Jobs received a number of honors and public recognition."
# cleanD=cleanAbstracts(text)
# }

Run the code above in your browser using DataLab