
Removal of stop words in a variety of contexts .
%sw%
- Binary operator version of rm_stopwords
that
defaults to separate = FALSE
..
rm_stopwords(
text.var,
stopwords = qdapDictionaries::Top25Words,
unlist = FALSE,
separate = TRUE,
strip = FALSE,
unique = FALSE,
char.keep = NULL,
names = FALSE,
ignore.case = TRUE,
apostrophe.remove = FALSE,
...
)rm_stop(
text.var,
stopwords = qdapDictionaries::Top25Words,
unlist = FALSE,
separate = TRUE,
strip = FALSE,
unique = FALSE,
char.keep = NULL,
names = FALSE,
ignore.case = TRUE,
apostrophe.remove = FALSE,
...
)
text.var %sw% stopwords
Returns a vector of sentences, vector of words, or (default) a list of vectors of words with stop words removed. Output depends on supplied arguments.
A character string of text or a vector of character strings.
A character vector of words to remove from the text. qdap
has a number of data sets that can be used as stop words including:
Top200Words
, Top100Words
, Top25Words
. For the tm
package's traditional English stop words use tm::stopwords("english")
.
logical. If TRUE
unlists into one vector. General use
intended for when separate is FALSE
.
logical. If TRUE
separates sentences into words. If
FALSE
retains sentences.
logical. IF TRUE
strips the text of all punctuation
except apostrophes.
logical. If TRUE
keeps only unique words (if unlist is
TRUE
) or sentences (if unlist is FALSE
). General use intended
for when unlist is TRUE
.
If strip is TRUE
this argument provides a means of
retaining supplied character(s).
logical. If TRUE
will name the elements of the vector or
list with the original text.var
.
logical. If TRUE
stopwords will be removed
regardless of case. Additionally, case will be stripped from the text. If
FALSE
stop word removal is contingent upon case. Additionally, case
is not stripped.
logical. If TRUE
removes apostrophe's from
the output.
further arguments passed to strip
function.
strip
,
bag_o_words
,
stopwords
if (FALSE) {
rm_stopwords(DATA$state)
rm_stopwords(DATA$state, tm::stopwords("english"))
rm_stopwords(DATA$state, Top200Words)
rm_stopwords(DATA$state, Top200Words, strip = TRUE)
rm_stopwords(DATA$state, Top200Words, separate = FALSE)
rm_stopwords(DATA$state, Top200Words, separate = FALSE, ignore.case = FALSE)
rm_stopwords(DATA$state, Top200Words, unlist = TRUE)
rm_stopwords(DATA$state, Top200Words, unlist = TRUE, strip=TRUE)
rm_stop(DATA$state, Top200Words, unlist = TRUE, unique = TRUE)
c("I like it alot", "I like it too") %sw% qdapDictionaries::Top25Words
}
Run the code above in your browser using DataLab