tokens_lookup: apply a dictionary to a tokens object

Description

Convert tokens into equivalence classes defined by values of a dictionary object.

Usage

tokens_lookup(x, dictionary, valuetype = c("glob", "regex", "fixed"), case_insensitive = TRUE, capkeys = FALSE, concatenator = " ", exclusive = TRUE, verbose = FALSE)

Arguments

tokens object to which dictionary or thesaurus will be supplied

dictionary

the dictionary-class object that will be applied to x

valuetype

how to interpret keyword expressions: "glob" for "glob"-style wildcard expressions; "regex" for regular expressions; or "fixed" for exact matching. See valuetype for details.

case_insensitive

ignore the case of dictionary values if TRUE uppercase to distinguish them from other features

capkeys

if TRUE, convert dictionary keys to uppercase to distinguish them from other features

concatenator

a charactor that connect words in multi-words entries

exclusive

if TRUE, remove all features not in dictionary, otherwise, replace values in dictionary with keys while leaving other features unaffected

verbose

print status messages if TRUE

Examples

Run this code

toks <- tokens(data_corpus_inaugural)
dict <- dictionary(list(country = "united states", 
                   law=c('law*', 'constitution'), 
                   freedom=c('free*', 'libert*')))
dfm(tokens_lookup(toks, dict, 'glob', verbose = TRUE))

dict_fix <- dictionary(list(country = "united states", 
                       law = c('law', 'constitution'), 
                       freedom = c('freedom', 'liberty'))) 
dfm(applyDictionary(toks, dict_fix, valuetype='fixed'))
dfm(tokens_lookup(toks, dict_fix, valuetype='fixed'))

Run the code above in your browser using DataLab