Learn R Programming

quanteda (version 0.99)

tokens_hash: Function to hash list-of-character tokens

Description

Creates a hashed object of tokens, called by tokens.

Usage

tokens_hash(x, types_reserved, ...)

Arguments

x

a source of tokenizedText

types_reserved

optional pre-existing types for mapping of tokens

...

additional arguments

Value

a list the hashed tokens found in each text

Details

This was formerly used to create a tokenizedTextsHashed object, but all tokens objects are now hashed, so this is just exported for testing until it will become internal only.

See Also

tokenize

Examples

Run this code
# NOT RUN {
txt <- c(doc1 = "The quick brown fox jumped over the lazy dog.",
         doc2 = "The dog jumped and ate the fox.")
toks <- tokenize(char_tolower(txt), remove_punct = TRUE)
toksHashed <- tokens_hash(toks)
toksHashed
# returned as a list
as.list(toksHashed)
# returned as a tokenized Text
as.tokenizedTexts(toksHashed)

# change case
toks <- tokens_hash(tokenize(c(one = "a b c d A B C D",
                                two = "A B C d")))

# }

Run the code above in your browser using DataLab