Learn R Programming

ptstem (version 0.0.7)

ptstem_words: Stem Words

Description

Stem a character vector of words using the selected algorithm.

Usage

ptstem_words(words, algorithm = "rslp", complete = T, ...)

ptstem(texts, algorithm = "rslp", n_char = 3, complete = T, ignore = NULL, ...)

Arguments

words, texts

character vector of words.

algorithm

string with the name of the algorithm to be used. One of "hunspell", "rslp", "porter" and modified-hunspell.

complete

wheter to complete words or not i.e. change all words with the same stem by the word that appears the most with that stem.

...

other arguments passed to the algorithms.

n_char

minimum number of characters of words to be stemmed. Not used by ptstem_words.

ignore

vector of words and regex's to igore. Words are wrapped around stringr::fixed() for words like 'banana' dont't get excluded when you ignore 'ana'. Also elements are considered a regex when they contain at least one punctuation symbol.

Details

You can choose wheter to complete words or not using the complete argument. By default all algorithms are completing stems. For hunspell, it's better to always use complete = TRUE since even when using complete = FALSE it will complete words.

Complete finds the stem that appears the most in the full corpus. That's why it should not be used when you are stemming in parallel.

Examples

Run this code
# NOT RUN {
words <- c("gostou", "gosto", "gostaram")
ptstem_words(words, "hunspell")
ptstem_words(words)
ptstem_words(words, algorithm = "porter", complete = FALSE)

texts <- c("coma frutas pois elas fazem bem para a vida.",
"nunca coma doces, eles fazem mal para os dentes.")
ptstem(texts, "hunspell")
ptstem(texts, n_char = 5)
ptstem(texts, "porter", n_char = 4, complete = FALSE)
ptstem(words, ignore = "av.*") # words starting with "av" are not stemmed


# }

Run the code above in your browser using DataLab