Learn R Programming

quanteda (version 0.9.8.5)

joinTokens: join tokens function

Description

Needs some more explanation

Usage

joinTokens(x, sequences, concatenator = "-", valuetype = "fixed", verbose = FALSE)

Arguments

x
some object
sequences
list of vector of features to concatenate
concatenator
character used for joining tokens
valuetype
how to interpret sequences: fixed for words as is; "regex" for regular expressions; or "glob" for "glob"-style wildcard
verbose
display progress

Examples

Run this code
toks <- tokenize(inaugCorpus, removePunct = TRUE)
seqs_token <- list(c('foreign', 'policy'), c('United', 'States'))
seqs_glob <- list(c('foreign', 'polic*'), c('United', 'States'))
seqs_regex <- list(c('^foreign', '^polic(ie|y)'), c('^United', '^States'))
toks2 <- joinTokens(toks, seqs_token, "_", 'fixed')
toks2 <- joinTokens(toks, seqs_glob, "_", 'glob')
toks2 <- joinTokens(toks, seqs_regex, "_", 'regex')
kwic(toks2, 'foreign_policy', window=1) # joined
kwic(toks2, c('foreign', 'policy'), window=1) # not joined
kwic(toks2, 'United_States', window=1) # joined

Run the code above in your browser using DataLab