Learn R Programming

kgrams (version 0.1.0)

query: Query k-gram frequency tables or dictionaries

Description

Return the frequency count of k-grams in a k-gram frequency table, or whether words are contained in a dictionary.

Usage

query(object, x)

# S3 method for kgram_freqs query(object, x)

# S3 method for kgrams_dictionary query(object, x)

Arguments

object

a kgram_freqs or dictionary class object.

x

a character vector. A list of k-grams if object is of class kgram_freqs, a list of words if object is a dictionary.

Value

an integer vector, containing k-gram counts of x, if object is a kgram_freqs class object, a logical vector if object is a dictionary. Vectorized over x.

Details

This generic has slightly different behaviors when querying for the presence of words in a dictionary and for k-gram counts in a frequency table respectively. For words, query() looks for exact matches between the input and the dictionary entries. Queries of Begin-Of-Sentence (BOS()) and End-Of-Sentence (EOS()) tokens always return TRUE, and queries of the Unknown-Word token return FALSE (see special_tokens).

On the other hand, queries of k-gram counts first perform a word level tokenization, so that anything separated by one or more space characters in the input is considered as a single word (thus, for instance queries of strings such as "i love you", " i love you"), or "i love you ") all produce the same outcome). Moreover, querying for any word outside the underlying dictionary returns the counts corresponding to the Unknown-Word token (UNK()) (e.g., if the word "prcsrn" is outside the dictionary, querying "i love prcsrn" is the same as querying paste("i love", UNK())). Queries from k-grams of order k > N will return NA.

See also the examples below.

Examples

Run this code
# NOT RUN {
# Querying a k-gram frequency table
f <- kgram_freqs("a a b a b b a b", N = 2)
query(f, c("a", "b")) # query single words
query(f, c("a b")) # query a 2-gram
identical(query(f, "c"), query(f, "d"))  # TRUE, both "c" and "d" are <UNK>
identical(query(f, UNK()), query(f, "c")) # TRUE
query(f, EOS()) # 1, since text is a single sentence

# Querying a dictionary
d <- as_dictionary(c("a", "b"))
query(d, c("a", "b", "c")) # query some words
query(f, c(BOS(), EOS(), UNK())) # c(TRUE, TRUE, FALSE)
# }

Run the code above in your browser using DataLab