The regions
class is a minimal representation of regions and does not
include information on the "strucs" (region IDs) that are used internally to
obtain values of s-attributes or information, which combination of conditions
on s-attributes has been used to obtain regions. This is left to the
subcorpus
corpus class. Whereas the subcorpus
class is associated with
the assumption, that a set of regions is a meaningful sub-unit of a corpus,
the regions
class has a focus on the individual sequences of tokens defined
by a structural attribute (such as paragraphs, sentences, named entities).
Information on regions is maintained in the cpos
slot of the regions
S4
class: A two-column matrix
with begin and end corpus positions (first and
second column, respectively). All other slots are inherited from the corpus
class.
The understanding of "regions" is modelled on the usage of terms by CWB
developers. As it is put in the
CQP Interface and
Query Language Manual: "Matching pairs of XML start and end tags are encoded
as token regions, identified by the corpus positions of the first token
(immediately following the start tag) and the last token (immediately
preceding the end tag) of the region." (p. 6)
The as.regions
-method coerces objects to a regions
-object.
The as.data.table
method returns the matrix with corpus
positions in the slot cpos
as a data.table
.