Learn R Programming

quanteda (version 0.9.9-3)

corpus_sample: randomly sample documents from a corpus

Description

Takes a random sample or documents or features of the specified size from a corpus or document-feature matrix, with or without replacement. Works just as sample works for the documents and their associated document-level variables.

Usage

corpus_sample(x, size = ndoc(x), replace = FALSE, prob = NULL, ...)

Arguments

x
a corpus object whose documents will be sampled
size
a positive number, the number of documents to select
replace
Should sampling be with replacement?
prob
A vector of probability weights for obtaining the elements of the vector being sampled.
...
unused

Value

A corpus object with number of documents equal to size, drawn from the corpus x. The returned corpus object will contain all of the meta-data of the original corpus, and the same document variables for the documents selected.

See Also

sample

Examples

Run this code
# sampling from a corpus
summary(corpus_sample(data_corpus_inaugural, 5)) 
summary(corpus_sample(data_corpus_inaugural, 10, replace=TRUE))

Run the code above in your browser using DataLab