rm_stopwords: Remove Stop Words

Description

Removal of stop words in a variety of contexts .

%sw% - Binary operator version of rm_stopwords that defaults to separate = FALSE..

Usage

rm_stopwords(
  text.var,
  stopwords = qdapDictionaries::Top25Words,
  unlist = FALSE,
  separate = TRUE,
  strip = FALSE,
  unique = FALSE,
  char.keep = NULL,
  names = FALSE,
  ignore.case = TRUE,
  apostrophe.remove = FALSE,
  ...
)
rm_stop(
  text.var,
  stopwords = qdapDictionaries::Top25Words,
  unlist = FALSE,
  separate = TRUE,
  strip = FALSE,
  unique = FALSE,
  char.keep = NULL,
  names = FALSE,
  ignore.case = TRUE,
  apostrophe.remove = FALSE,
  ...
)
text.var %sw% stopwords

Value

Returns a vector of sentences, vector of words, or (default) a list of vectors of words with stop words removed. Output depends on supplied arguments.

Arguments

text.var: A character string of text or a vector of character strings.
stopwords: A character vector of words to remove from the text. qdap has a number of data sets that can be used as stop words including: Top200Words, Top100Words, Top25Words. For the tm package's traditional English stop words use tm::stopwords("english").
unlist: logical. If TRUE unlists into one vector. General use intended for when separate is FALSE.
separate: logical. If TRUE separates sentences into words. If FALSE retains sentences.
strip: logical. IF TRUE strips the text of all punctuation except apostrophes.
unique: logical. If TRUE keeps only unique words (if unlist is TRUE) or sentences (if unlist is FALSE). General use intended for when unlist is TRUE.
char.keep: If strip is TRUE this argument provides a means of retaining supplied character(s).
names: logical. If TRUE will name the elements of the vector or list with the original text.var.
ignore.case: logical. If TRUE stopwords will be removed regardless of case. Additionally, case will be stripped from the text. If FALSE stop word removal is contingent upon case. Additionally, case is not stripped.
apostrophe.remove: logical. If TRUE removes apostrophe's from the output.
...: further arguments passed to strip function.

Examples

Run this code

if (FALSE) {
rm_stopwords(DATA$state)
rm_stopwords(DATA$state, tm::stopwords("english"))
rm_stopwords(DATA$state, Top200Words)
rm_stopwords(DATA$state, Top200Words, strip = TRUE)
rm_stopwords(DATA$state, Top200Words, separate = FALSE)
rm_stopwords(DATA$state, Top200Words, separate = FALSE, ignore.case = FALSE)
rm_stopwords(DATA$state, Top200Words, unlist = TRUE)
rm_stopwords(DATA$state, Top200Words, unlist = TRUE, strip=TRUE)
rm_stop(DATA$state, Top200Words, unlist = TRUE, unique = TRUE)

c("I like it alot", "I like it too") %sw% qdapDictionaries::Top25Words
}

Run the code above in your browser using DataLab

Description

Usage

Value

Arguments

See Also

Examples