Return various kinds of stopwords with support for different languages.
stopwords(kind = "en")
A character string identifying the desired stopword list.
A character vector containing the requested stopwords. An error
is raised if no stopwords are available for the requested
kind
.
Available stopword lists are:
catalan
Catalan stopwords (obtained from http://latel.upf.edu/morgana/altres/pub/ca_stop.htm),
romanian
Romanian stopwords (extracted from http://snowball.tartarus.org/otherapps/romanian/romanian1.tgz),
SMART
English stopwords from the SMART information retrieval system (obtained from http://jmlr.csail.mit.edu/papers/volume5/lewis04a/a11-smart-stop-list/english.stop) (which coincides with the stopword list used by the MC toolkit (http://www.cs.utexas.edu/users/dml/software/mc/)),
and a set of stopword lists from the Snowball stemmer project in different
languages (obtained from
http://svn.tartarus.org/snowball/trunk/website/algorithms/*/stop.txt).
Supported languages are danish
, dutch
, english
,
finnish
, french
, german
, hungarian
, italian
,
norwegian
, portuguese
, russian
, spanish
, and
swedish
. Language names are case sensitive. Alternatively, their
IETF language tags may be used.
# NOT RUN {
stopwords("en")
stopwords("SMART")
stopwords("german")
# }
Run the code above in your browser using DataLab