corpus_trim

char_trim

<a rd-options="" href="/link/corpus?package=quanteda&version=2.1.2" data-mini-rdoc="quanteda::corpus">corpus</a> or character object whose sentences will be selected.

units of trimming, <code>"sentences"</code> or <code>"paragraphs"</code>, or
<code>"documents"</code>

what

minimum and maximum lengths in word tokens
(excluding punctuation)

min_ntoken, max_ntoken

a stringi regular expression whose match (at the
sentence level) will be used to exclude sentences

exclude_pattern

Removes sentences from a corpus or a character vector shorter than a
specified length.

character

corpus

A fast, flexible, and comprehensive framework for
quantitative text analysis in R.  Provides functionality for corpus management,
creating and manipulating tokens and ngrams, exploring keywords in context,
forming and manipulating sparse matrices
of documents by features and feature co-occurrences, analyzing keywords, computing feature similarities and
distances, applying content dictionaries, applying supervised and unsupervised machine learning,
visually representing text and text analyses, and more.

Kenneth Benoit

quanteda

Quantitative Analysis of Textual Data

Kohei Watanabe

Haiyan Wang

Paul Nulty

Adam Obeng

Stefan M<c3><bc>ller

Akitaka Matsuo

Jiong Wei Lua

Jouni Kuha

William Lowe

Christian M<c3><bc>ller

Lori Young

Stuart Soroka

Ian Fellows

European Research Council 

corpus_trim function

<a rd-options='' href='corpus'>corpus</a> or character object whose sentences will be selected.

corpus_trim: Remove sentences based on their token lengths or a pattern match

Description

Usage

Arguments

Value

Examples