Last chance! 50% off unlimited learning
Sale ends in
Displays information about a corpus or vector of texts. For a corpus, this includes attributes and metadata such as date of number of texts, creation and source. For texts, prints to the console a desription of the texts, including number of types, tokens, and sentences.
# S3 method for character
summary(object, n = 100, verbose = TRUE,
toLower = FALSE, ...)# S3 method for corpus
summary(object, n = 100, verbose = TRUE,
showmeta = FALSE, toLower = FALSE, ...)
corpus or texts to be summarized
maximum number of texts to describe, default=100
set to FALSE
to turn off printed output, for instance
if you simply want to assign the output to a data.frame
convert texts to lower case before counting types
additional arguments passed through to tokenize
for a corpus, set to TRUE
to include document-level
meta-data
# NOT RUN {
# summarize texts
summary(c("Testing this text. Second sentence.", "And this one."))
summary(data_char_ukimmig2010)
myTextSummaryDF <- summary(data_char_ukimmig2010, verbose = FALSE)
head(myTextSummaryDF)
# summarize corpus information
summary(data_corpus_inaugural)
summary(data_corpus_inaugural, n=10)
mycorpus <- corpus(data_char_ukimmig2010,
docvars = data.frame(party=names(data_char_ukimmig2010)))
summary(mycorpus, showmeta=TRUE) # show the meta-data
mysummary <- summary(mycorpus, verbose=FALSE) # (quietly) assign the results
mysummary$Types / mysummary$Tokens # crude type-token ratio
# }
Run the code above in your browser using DataLab