Learn R Programming

quanteda (version 0.9.9-50)

texts: get or assign corpus texts

Description

Get or replace the texts in a corpus, with grouping options. Works for plain character vectors too, if groups is a factor.

Usage

texts(x, groups = NULL, spacer = "  ")

texts(x) <- value

# S3 method for corpus as.character(x, ...)

Arguments

x
a corpus or character object
groups
either: a character vector containing the names of document variables to be used for grouping; or a factor (or object that can be coerced into a factor) equal in length to the number of documents, used for aggregating the texts through concatenation
spacer
when concatenating texts by using groups, this will be the spacing added between texts. (Default is two spaces.)
value
character vector of the new texts
...
unused

Value

For texts, a character vector of the texts in the corpus.

For texts <-, the corpus with the updated texts.

for texts <-, a corpus with the texts replaced by value

as.character(x) is equivalent to texts(x)

Details

as.character(x) where x is a corpus is equivalent to calling texts(x)

Examples

Run this code
nchar(texts(corpus_subset(data_corpus_inaugural, Year < 1806)))

# grouping on a document variable
nchar(texts(corpus_subset(data_corpus_inaugural, Year < 1806), groups = "President"))

# grouping a character vector using a factor
nchar(data_char_ukimmig2010[1:5])
nchar(texts(data_corpus_inaugural[1:5], 
            groups = as.factor(data_corpus_inaugural[1:5, "President"])))

BritCorpus <- corpus(c("We must prioritise honour in our neighbourhood.", 
                       "Aluminium is a valourous metal."))
texts(BritCorpus) <- 
    stringi::stri_replace_all_regex(texts(BritCorpus),
                                   c("ise", "([nlb])our", "nium"),
                                   c("ize", "$1or", "num"),
                                   vectorize_all = FALSE)
texts(BritCorpus)
texts(BritCorpus)[2] <- "New text number 2."
texts(BritCorpus)

Run the code above in your browser using DataLab