summary.character: summarize a corpus or a vector of texts

Description

Displays information about a corpus or vector of texts. For a corpus, this includes attributes and metadata such as date of number of texts, creation and source. For texts, prints to the console a desription of the texts, including number of types, tokens, and sentences.

Usage

# S3 method for character
summary(object, n = 100, verbose = TRUE,
  toLower = FALSE, ...)
# S3 method for corpus
summary(object, n = 100, verbose = TRUE,
  showmeta = FALSE, toLower = FALSE, ...)

Arguments

object

corpus or texts to be summarized

maximum number of texts to describe, default=100

verbose

set to FALSE to turn off printed output, for instance if you simply want to assign the output to a data.frame

toLower

convert texts to lower case before counting types

...

additional arguments passed through to tokenize

showmeta

for a corpus, set to TRUE to include document-level meta-data

Examples

Run this code

# NOT RUN {
# summarize texts
summary(c("Testing this text.  Second sentence.", "And this one."))
summary(data_char_ukimmig2010)
myTextSummaryDF <- summary(data_char_ukimmig2010, verbose = FALSE)
head(myTextSummaryDF)
# summarize corpus information
summary(data_corpus_inaugural)
summary(data_corpus_inaugural, n=10)
mycorpus <- corpus(data_char_ukimmig2010, 
                   docvars = data.frame(party=names(data_char_ukimmig2010))) 
summary(mycorpus, showmeta=TRUE)  # show the meta-data
mysummary <- summary(mycorpus, verbose=FALSE)  # (quietly) assign the results
mysummary$Types / mysummary$Tokens             # crude type-token ratio
# }

Run the code above in your browser using DataLab