hyphen,kRp.text-method

hyphen

hyphen_df,kRp.text-method

hyphen_c,kRp.text-method

Either an object of class <code><a rd-options="koRpus:kRp.text-class" href="/link/kRp.text?package=koRpus&version=0.13-4&to=koRpus%3AkRp.text-class" data-mini-rdoc="koRpus:kRp.text-class::kRp.text">kRp.text</a></code>,
or a character vector with words to be hyphenated.

words

Either an object of class <code><a rd-options="sylly:kRp.hyph.pat-class" href="/link/kRp.hyph.pat?package=koRpus&version=0.13-4&to=sylly%3AkRp.hyph.pat-class" data-mini-rdoc="sylly:kRp.hyph.pat-class::kRp.hyph.pat">kRp.hyph.pat</a></code>,
 or
a valid character string naming the language of the patterns to be used. See details.

hyph.pattern

Integer,
 number of letters a word must have for considering a hyphenation. <code>hyphen</code> will
not split words after the first or before the last letter,
 so values smaller than 4 are not useful.

min.length

Logical,
 whether appearing hyphens in words should be removed before pattern matching.

rm.hyph

A character vector with word classes which should be ignored. The default value
<code>"nonpunct"</code> has special meaning and will cause the result of
<code>kRp.POS.tags(lang, tags=c("punct","sentc"),
 list.classes=TRUE)</code> to be used. Relevant only if <code>words</code>
is a valid koRpus object.

corp.rm.class

A character vector with POS tags which should be ignored. Relevant only if <code>words</code>
is a valid koRpus object.

corp.rm.tag

Logical. If <code>FALSE</code>, short status messages will be shown.

quiet

Logical. <code>hyphen()</code> can cache results to speed up the process. If this option is set to <code>TRUE</code>,
 the
current cache will be queried and new tokens also be added. Caches are language-specific and reside in an environment,
i.e., they are cleaned at the end of a session. If you want to save these for later use,
 see the option <code>hyph.cache.file</code>
in <code><a rd-options="koRpus:set.kRp.env" href="/link/set.kRp.env?package=koRpus&version=0.13-4&to=koRpus%3Aset.kRp.env" data-mini-rdoc="koRpus:set.kRp.env::set.kRp.env">set.kRp.env</a></code>.

cache

A character string defining the class of the object to be returned. Defaults to <code>"kRp.hyphen"</code>,
 but can also be
set to <code>"data.frame"</code> or <code>"numeric"</code>,
 returning only the central <code>data.frame</code> or the numeric vector of counted syllables,
respectively. For the latter two options,
 you can alternatively use the shortcut methods <code>hyphen_df</code> or <code>hyphen_c</code>.
Ignored if <code>as.feature=TRUE</code>.

Logical,
 whether the output should be just the analysis results or the input object with
the results added as a feature. Use <code><a rd-options="koRpus:corpusHyphen" href="/link/corpusHyphen?package=koRpus&version=0.13-4&to=koRpus%3AcorpusHyphen" data-mini-rdoc="koRpus:corpusHyphen::corpusHyphen">corpusHyphen</a></code> to get the results from such an aggregated object.
If set to <code>TRUE</code>, <code>as="kRp.hyphen"</code> is automatically set,
 overwriting other setting of <code>as</code> with a warning.

as.feature

These methods implement word hyphenation, based on Liang's algorithm.
For details, please refer to the documentation for the generic
<code><a rd-options="sylly:hyphen" href="/link/hyphen?package=koRpus&version=0.13-4&to=sylly%3Ahyphen" data-mini-rdoc="sylly:hyphen::hyphen">hyphen</a></code> method in the <code>sylly</code> package.

hyphenation

A set of tools to analyze texts. Includes, amongst others, functions for
automatic language detection, hyphenation, several indices of lexical diversity
(e.g., type token ratio, HD-D/vocd-D, MTLD) and readability (e.g., Flesch,
SMOG, LIX, Dale-Chall). Basic import functions for language corpora are also
provided, to enable frequency analyses (supports Celex and Leipzig Corpora
Collection file formats) and measures like tf-idf. Note: For full functionality
a local installation of TreeTagger is recommended. It is also recommended to
not load this package directly, but by loading one of the available language
support packages from the 'l10n' repository
<https://undocumeantit.github.io/repos/l10n/>. 'koRpus' also includes a plugin
for the R GUI and IDE RKWard, providing graphical dialogs for its basic
features. The respective R package 'rkward' cannot be installed directly from a
repository, as it is a part of RKWard. To make full use of this feature, please
install RKWard from <https://rkward.kde.org> (plugins are detected
automatically). Due to some restrictions on CRAN, the full package sources are
only available from the project homepage. To ask for help, report bugs, request
features, or discuss the development of the package, please subscribe to the
koRpus-dev mailing list (<https://korpusml.reaktanz.de>).

Meik Michalke

koRpus

Text Analysis with Emphasis on POS Tagging, Readability and
Lexical Diversity

Earl Brown

Alberto Mirisola

Alexandre Brulet

Laura Hauser

hyphen,kRp.text-method function

Either an object of class <code><a rd-options='koRpus:kRp.text-class' href='kRp.text'>kRp.text</a></code>,
or a character vector with words to be hyphenated.

Either an object of class <code><a rd-options='sylly:kRp.hyph.pat-class' href='kRp.hyph.pat'>kRp.hyph.pat</a></code>,
 or
a valid character string naming the language of the patterns to be used. See details.

Logical. <code>hyphen()</code> can cache results to speed up the process. If this option is set to <code>TRUE</code>,
 the
current cache will be queried and new tokens also be added. Caches are language-specific and reside in an environment,
i.e., they are cleaned at the end of a session. If you want to save these for later use,
 see the option <code>hyph.cache.file</code>
in <code><a rd-options='koRpus:set.kRp.env' href='set.kRp.env'>set.kRp.env</a></code>.

Logical,
 whether the output should be just the analysis results or the input object with
the results added as a feature. Use <code><a rd-options='koRpus:corpusHyphen' href='corpusHyphen'>corpusHyphen</a></code> to get the results from such an aggregated object.
If set to <code>TRUE</code>, <code>as="kRp.hyphen"</code> is automatically set,
 overwriting other setting of <code>as</code> with a warning.

These methods implement word hyphenation, based on Liang's algorithm.
For details, please refer to the documentation for the generic
<code><a rd-options='sylly:hyphen' href='hyphen'>hyphen</a></code> method in the <code>sylly</code> package.

hyphen,kRp.text-method: Automatic hyphenation

Description

Usage

Arguments

Value

References

See Also

Examples