corpus_trim

char_trim

<a rd-options="" href="/link/corpus?package=quanteda&version=0.99.22" data-mini-rdoc="quanteda::corpus">corpus</a> or character object whose sentences will be selected.

units of triming, <code>"sentences"</code> or <code>"paragraphs"</code>, or
<code>"documents"</code>

what

minimum and maximum lengths in word tokens 
(excluding punctuation)

min_ntoken, max_ntoken

a stringi regular expression whose match (at the
sentence level) will be used to exclude sentences

exclude_pattern

Removes sentences from a corpus or a character vector shorter than a 
specified length.

internal

character

corpus

A fast, flexible, and comprehensive framework for
quantitative text analysis in R.  Provides functionality for corpus management,
creating and manipulating tokens and ngrams, exploring keywords in context,
forming and manipulating sparse matrices
of documents by features and feature co-occurrences, analyzing keywords, computing feature similarities and
distances, applying content dictionaries, applying supervised and unsupervised machine learning,
visually representing text and text analyses, and more.

Kenneth Benoit

quanteda

Quantitative Analysis of Textual Data

Kohei Watanabe

Paul Nulty

Adam Obeng

Haiyan Wang

Benjamin Lauderdale

Will Lowe

corpus_trim function

<a rd-options='' href='corpus'>corpus</a> or character object whose sentences will be selected.

corpus_trim: remove sentences based on their token lengths or a pattern match

Description

Usage

Arguments

Value

Examples