Learn R Programming

koRpus (version 0.04-40)

hyphen: Automatic hyphenation

Description

This function implements word hyphenation, based on Liang's algorithm.

Usage

hyphen(words, hyph.pattern = NULL, min.length = 3,
    rm.hyph = TRUE, corp.rm.class = "nonpunct",
    corp.rm.tag = c(), quiet = FALSE, cache = TRUE)

Arguments

words
Either an object of class kRp.tagged-class, kRp.txt.freq-class or
hyph.pattern
Either an object of class kRp.hyph.pat-class, or a valid character string naming the language of the patterns to be used. See details.
min.length
Integer, number of letters a word must have for considering a hyphenation.
rm.hyph
Logical, whether appearing hyphens in words should be removed before pattern matching.
corp.rm.class
A character vector with word classes which should be ignored. The default value "nonpunct" has special meaning and will cause the result of kRp.POS.tags(lang, c("punct","sentc"), list.classes=TRUE) to be used. Relevant on
corp.rm.tag
A character vector with POS tags which should be ignored. Relevant only if words is a valid koRpus object.
quiet
Logical. If FALSE, short status messages will be shown.
cache
Logical. hyphen() can cache results to speed up the process. If this option is set to TRUE, the current cache will be queried and new tokens also be added. Caches are language-specific and reside in an environment, i.e.,

Value

Details

For this to work the function must be told which pattern set it should use to find the right hyphenation spots. If words is already a tagged object, its language definition might be used. Otherwise, in addition to the words to be processed you must specify hyph.pattern. You have two options: If you want to use one of the built-in language patterns, just set it to the according language abbrevation. As of this version valid choices are:
  • "de"
{--- German (new spelling, since 1996)} "de.old" {--- German (old spelling, 1901--1996)} "en" {--- English (UK)} "en.us" {--- English (US)} "es" {--- Spanish} "fr" {--- French} "it" {--- Italian} "ru" {--- Russian}

References

Liang, F.M. (1983). Word Hy-phen-a-tion by Com-put-er. Dissertation, Stanford University, Dept. of Computer Science.

[1] http://tug.ctan.org/tex-archive/language/hyph-utf8/tex/generic/hyph-utf8/patterns/

[2] http://www.ctan.org/tex-archive/macros/latex/base/lppl.txt

See Also

read.hyph.pat, manage.hyph.pat