Learn R Programming

quanteda (version 0.9.7-17)

wordstem: stem words

Description

Apply a stemmer to words. This is a wrapper to wordStem designed to allow this function to be called without loading the entire SnowballC package. wordStem uses Martin Porter's stemming algorithm and the C libstemmer library generated by Snowball.

Usage

wordstem(x, language = "porter")
"wordstem"(x, language = "porter")
"wordstem"(x, language = "porter")
"wordstem"(x, language = "porter")

Arguments

x
a character vector or set of tokenized texts, whose word stems are to be removed. If tokenized texts, the tokenization must be word-based.
language
the name of a recognized language, as returned by getStemLanguages, or a two- or three-letter ISO-639 code corresponding to one of these languages (see references for the list of codes)

Value

A character vector with as many elements as there are in the input vector with the corresponding elements being the stem of the word. Elements of the vector are converted to UTF-8 encoding before the stemming is performed, and the returned elements are marked as such when they contain non-ASCII characters.

References

http://snowball.tartarus.org/ http://www.iso.org/iso/home/standards/language_codes.htm for the ISO-639 language codes

See Also

wordStem

Examples

Run this code
#' Simple example
wordstem(c("win", "winning", "wins", "won", "winner"))

Run the code above in your browser using DataLab