texts: get or assign corpus texts

Description

Get or replace the texts in a corpus object, with grouping options. Works for plain character vectors too, if groups is a factor.

Usage

texts(x, groups = NULL, ...)
texts(x) <- value
"as.character"(x, ...)

Arguments

a quanteda corpus or character object

groups

character vector containing the names of document variables in a corpus, or a factor equal in length to the number of documents, used for aggregating the texts through concatenation. If x is of type character, then groups must be a factor.

...

unused

value

character vector of the new texts

Value

For texts, a character vector of the texts in the corpus.For texts <-, the corpus with the updated texts.for texts <-, a corpus with the texts replaced by value

Details

as.character(x) where x is a corpus is equivalent to calling texts(x)

Examples

Run this code

nchar(texts(corpus_subset(data_corpus_inaugural, Year < 1806)))

# grouping on a document variable
nchar(texts(corpus_subset(data_corpus_inaugural, Year < 1806), groups = "President"))

# grouping a character vector using a factor
nchar(data_char_inaugural[1:5])
nchar(texts(data_char_inaugural[1:5], 
            groups = as.factor(data_corpus_inaugural[1:5, "President"])))

BritCorpus <- corpus(c("We must prioritise honour in our neighbourhood.", 
                       "Aluminium is a valourous metal."))
texts(BritCorpus) <- 
    stringi::stri_replace_all_regex(texts(BritCorpus),
                                   c("ise", "([nlb])our", "nium"),
                                   c("ize", "$1or", "num"),
                                   vectorize_all = FALSE)
texts(BritCorpus)
texts(BritCorpus)[2] <- "New text number 2."
texts(BritCorpus)

Run the code above in your browser using DataLab