Learn R Programming

quanteda (version 0.9.7-17)

ntoken: count the number of tokens or types

Description

Return the count of tokens (total features) or types (unique features) in a text, corpus, or dfm. "tokens" here means all words, not unique words, and these are not cleaned prior to counting.

Usage

ntoken(x, ...)
ntype(x, ...)
"ntoken"(x, ...)
"ntype"(x, ...)
"ntoken"(x, ...)
"ntoken"(x, ...)
"ntype"(x, ...)
"ntoken"(x, ...)
"ntype"(x, ...)
"ntype"(x, ...)

Arguments

x
texts or corpus whose tokens or types will be counted
...
additional arguments passed to tokenize

Value

scalar count of the total tokens or types

Examples

Run this code
# simple example
txt <- c(text1 = "This is a sentence, this.", text2 = "A word. Repeated repeated.")
ntoken(txt)
ntype(txt)
ntoken(toLower(txt))  # same
ntype(toLower(txt))   # fewer types
ntoken(toLower(txt), removePunct = TRUE)
ntype(toLower(txt), removePunct = TRUE)

# with some real texts
ntoken(subset(inaugCorpus, Year<1806, removePunct = TRUE))
ntype(subset(inaugCorpus, Year<1806, removePunct = TRUE))
ntoken(dfm(subset(inaugCorpus, Year<1800)))
ntype(dfm(subset(inaugCorpus, Year<1800)))

Run the code above in your browser using DataLab