The +
operator for a corpus object will combine two corpus
objects, resolving any non-matching docvars()
by making them
into NA
values for the corpus lacking that field. Corpus-level meta
data is concatenated, except for source
and notes
, which are
stamped with information pertaining to the creation of the new joined
corpus.
The c()
operator is also defined for corpus class objects, and provides
an easy way to combine multiple corpus objects.
There are some issues that need to be addressed in future revisions of
quanteda concerning the use of factors to store document variables and
meta-data. Currently most or all of these are not recorded as factors,
because we use stringsAsFactors=FALSE
in the
data.frame()
calls that are used to create and store the
document-level information, because the texts should always be stored as
character vectors and never as factors.