texts: get corpus texts

Description

Get the texts in a quanteda corpus object, with grouping options. Works for plain character vectors too, if groups is a factor.

Usage

texts(x, groups = NULL, ...)
"texts"(x, groups = NULL, ...)
"texts"(x, groups = NULL, ...)
texts(x) <- value
"texts"(x) <- value
"texts"(x, groups = NULL, ...)

Arguments

A quanteda corpus object

groups

character vector containing the names of document variables in a corpus, or a factor equal in length to the number of documents, used for aggregating the texts through concatenation. If x is of type character, then groups must be a factor.

...

unused

value

character vector of the new texts

Value

For texts, a character vector of the texts in the corpus.For texts <-, the corpus with the updated texts.

Examples

Run this code

nchar(texts(subset(inaugCorpus, Year < 1806)))

# grouping on a document variable
nchar(texts(subset(inaugCorpus, Year < 1806), groups = "President"))

# grouping a character vector using a factor
nchar(inaugTexts[1:5])
nchar(texts(inaugTexts[1:5], groups = as.factor(inaugCorpus[1:5, "President"])))

BritCorpus <- corpus(c("We must prioritise honour in our neighbourhood.", 
                       "Aluminium is a valourous metal."))
texts(BritCorpus) <- 
    stringi::stri_replace_all_regex(texts(BritCorpus),
                                   c("ise", "([nlb])our", "nium"),
                                   c("ize", "$1or", "num"),
                                   vectorize_all = FALSE)
texts(BritCorpus)
texts(BritCorpus)[2] <- "New text number 2."
texts(BritCorpus)

Run the code above in your browser using DataLab