corpus_sample

a corpus object whose documents will be sampled

a positive number, the number of documents to select; when used
with groups, the number to select from each group or a vector equal in
length to the number of groups defining the samples to be chosen in each
group category. By defining a size larger than the number of documents, it
is possible to oversample groups.

size

Should sampling be with replacement?

replace

A vector of probability weights for obtaining the elements of the
vector being sampled. May not be applied when <code>by</code> is used.

prob

a grouping variable for sampling. Useful for resampling
sub-document units such as sentences, for instance by specifying <code>by = "document"</code>

Take a random sample of documents of the specified size from a corpus, with
or without replacement. Works just as <code><a rd-options="=sample" href="/link/sample()?package=quanteda&version=2.0.1&to=%3Dsample" data-mini-rdoc="=sample::sample()">sample()</a></code> works for the
documents and their associated document-level variables.

corpus

A fast, flexible, and comprehensive framework for
quantitative text analysis in R.  Provides functionality for corpus management,
creating and manipulating tokens and ngrams, exploring keywords in context,
forming and manipulating sparse matrices
of documents by features and feature co-occurrences, analyzing keywords, computing feature similarities and
distances, applying content dictionaries, applying supervised and unsupervised machine learning,
visually representing text and text analyses, and more.

Kenneth Benoit

quanteda

Quantitative Analysis of Textual Data

Kohei Watanabe

Haiyan Wang

Paul Nulty

Adam Obeng

Stefan M<c3><bc>ller

Akitaka Matsuo

Jiong Wei Lua

Jouni Kuha

William Lowe

Christian M<c3><bc>ller

Lori Young

Stuart Soroka

Ian Fellows

European Research Council 

corpus_sample function

Take a random sample of documents of the specified size from a corpus, with
or without replacement. Works just as <code><a rd-options='=sample' href='sample()'>sample()</a></code> works for the
documents and their associated document-level variables.

corpus_sample: Randomly sample documents from a corpus

Description

Usage

Arguments

Value

Examples