Learn R Programming

quanteda (version 0.9.9-3)

corpuszip: construct a compressed corpus object

Description

Construct a compressed version of a corpus.

Usage

corpuszip(x, docnames = NULL, docvars = NULL, text_field = "text", metacorpus = NULL, ...)

Arguments

x
a valid corpus source object
docnames
Names to be assigned to the texts, defaults to the names of the character vector (if any), otherwise assigns "text1", "text2", etc.
docvars
A data frame of attributes that is associated with each text.
text_field
the character name or numeric index of the source data.frame indicating the variable to be read in as text, which must be a character vector. All other variables in the data.frame will be imported as docvars. This argument is only used for data.frame objects (including those created by readtext).
metacorpus
a named list containing additional (character) information to be added to the corpus as corpus-level metadata. Special fields recognized in the summary.corpus are:
  • source a description of the source of the texts, used for referencing;
  • citation information on how to cite the corpus; and
  • notes any additional information about who created the text, warnings, to do lists, etc.
...
not used directly

Examples

Run this code
# create a compressed corpus from texts
corpuszip(data_char_inaugural)

# create a compressed corpus from texts and assign meta-data and document variables
cop <- corpus(data_char_ukimmig2010, 
              docvars = data.frame(party = names(data_char_ukimmig2010)))
cop_zip <- corpuszip(data_char_ukimmig2010, 
                     docvars = data.frame(party = names(data_char_ukimmig2010)))
object.size(cop)
object.size(cop_zip)

Run the code above in your browser using DataLab