size: Get Number of Tokens.

Description

The method will get the number of tokens in a corpus, partition or subcorpus, split up by an s-attribute if provided.

Usage

size(x, ...)
# S4 method for corpus
size(x, s_attribute = NULL, verbose = TRUE, ...)
# S4 method for character
size(x, s_attribute = NULL, verbose = TRUE, ...)
# S4 method for partition
size(x, s_attribute = NULL, ...)
# S4 method for partition_bundle
size(x)
# S4 method for DocumentTermMatrix
size(x)
# S4 method for TermDocumentMatrix
size(x)
# S4 method for features
size(x)
# S4 method for remote_corpus
size(x)
# S4 method for remote_partition
size(x)

Value

If .Object is a corpus (a corpus object or specified by corpus id), an integer vector if argument s_attribute is NULL, a two-column data.table otherwise (first column is the s-attribute, second column: "size"). If .Object is a subcorpus_bundle or a partition_bundle, a data.table (with columns "name" and "size").

Arguments

x: An object to get size(s) for.
...: Further arguments (used only for backwards compatibility).
s_attribute: A character vector with s-attributes (one or more).
verbose: A logical value, whether to output messages.

Details

One or more s-attributes can be provided to get the dispersion of tokens across one or more dimensions. If s_attribute is a child of the s-attribute defining a subcorpus or partition, the struc values need to be decoded for all corpus positions, which may take some time.

The size()-method for features objects will return a named list with the size of the corpus of interest ("coi"), i.e. the number of tokens in the window, and the reference corpus ("ref"), i.e. the number of tokens that are not matched by the query and that are outside the window.

Examples

Run this code

use("polmineR")
use(pkg = "RcppCWB", corpus = "REUTERS")

# for corpus object
corpus("REUTERS") %>% size()
corpus("REUTERS") %>% size(s_attribute = "id")
corpus("GERMAPARLMINI") %>% size(s_attribute = c("date", "party"))

# for corpus specified by ID
size("GERMAPARLMINI")
size("GERMAPARLMINI", s_attribute = "date")
size("GERMAPARLMINI", s_attribute = c("date", "party"))

# for partition object
P <- partition("GERMAPARLMINI", date = "2009-11-11")
size(P, s_attribute = "speaker")
size(P, s_attribute = "party")
size(P, s_attribute = c("speaker", "party"))

# for subcorpus
sc <- corpus("GERMAPARLMINI") %>% subset(date == "2009-11-11")
size(sc, s_attribute = "speaker")
size(sc, s_attribute = "party")
size(sc, s_attribute = c("speaker", "party"))

# for subcorpus_bundle
subcorpora <- corpus("GERMAPARLMINI") %>% split(s_attribute = "date")
size(subcorpora)