Learn R Programming

quanteda (version 0.99.12)

selectFeaturesOLD: old version of selectFeatures.tokenizedTexts

Description

Calls C++ for super-fast selection or removal of features from a set of tokens.

Usage

selectFeaturesOLD(x, ...)

# S3 method for tokenizedTexts selectFeaturesOLD(x, features, selection = c("keep", "remove"), valuetype = c("glob", "regex", "fixed"), case_insensitive = TRUE, verbose = TRUE, ...)

Arguments

x

object whose features will be selected

...

supplementary arguments passed to the underlying functions in stri_detect_regex. (This is how case_insensitive is passed, but you may wish to pass others.)

features

one of: a character vector of features to be selected, a dfm whose features will be used for selection, or a dictionary class object whose values (not keys) will provide the features to be selected. For dfm objects, see details in the Value section below.

selection

whether to keep or remove the features

valuetype

the type of pattern matching: "glob" for "glob"-style wildcard expressions; "regex" for regular expressions; or "fixed" for exact matching. See valuetype for details.

case_insensitive

ignore the case of dictionary values if TRUE

verbose

if TRUE print message about how many features were removed

Examples

Run this code
# NOT RUN {
toks <- tokenize(c("This is some example text from me.", "More of the example text."), 
                 remove_punct = TRUE)
selectFeaturesOLD(toks, stopwords("english"), "remove")
selectFeaturesOLD(toks, "ex", "keep", valuetype = "regex")
# }

Run the code above in your browser using DataLab