Learn R Programming

quanteda (version 4.1.0)

docvars: Get or set document-level variables

Description

Get or set variables associated with a document in a corpus, tokens or dfm object.

Usage

docvars(x, field = NULL)

docvars(x, field = NULL) <- value

# S3 method for corpus $(x, name)

# S3 method for corpus $(x, name) <- value

# S3 method for tokens $(x, name)

# S3 method for tokens $(x, name) <- value

# S3 method for dfm $(x, name)

# S3 method for dfm $(x, name) <- value

Value

docvars returns a data.frame of the document-level variables, dropping the second dimension to form a vector if a single docvar is returned.

docvars<- assigns value to the named field

Arguments

x

corpus, tokens, or dfm object whose document-level variables will be read or set

field

string containing the document-level variable name

value

a vector of document variable values to be assigned to name

name

a literal character string specifying a single docvars name

Accessing or assigning docvars using the <code>$</code> operator

As of quanteda v2, it is possible to access and assign a docvar using the $ operator. See Examples.

Examples

Run this code
# retrieving docvars from a corpus
head(docvars(data_corpus_inaugural))
tail(docvars(data_corpus_inaugural, "President"), 10)
head(data_corpus_inaugural$President)

# assigning document variables to a corpus
corp <- data_corpus_inaugural
docvars(corp, "President") <- paste("prez", 1:ndoc(corp), sep = "")
head(docvars(corp))
corp$fullname <- paste(data_corpus_inaugural$FirstName,
                       data_corpus_inaugural$President)
tail(corp$fullname)


# accessing or assigning docvars for a corpus using "$"
data_corpus_inaugural$Year
data_corpus_inaugural$century <- floor(data_corpus_inaugural$Year / 100)
data_corpus_inaugural$century

# accessing or assigning docvars for tokens using "$"
toks <- tokens(corpus_subset(data_corpus_inaugural, Year <= 1805))
toks$Year
toks$Year <- 1991:1995
toks$Year
toks$nonexistent <- TRUE
docvars(toks)

# accessing or assigning docvars for a dfm using "$"
dfmat <- dfm(toks)
dfmat$Year
dfmat$Year <- 1991:1995
dfmat$Year
dfmat$nonexistent <- TRUE
docvars(dfmat)

Run the code above in your browser using DataLab