Learn R Programming

koRpus (version 0.13-4)

hyphen,kRp.text-method: Automatic hyphenation

Description

These methods implement word hyphenation, based on Liang's algorithm. For details, please refer to the documentation for the generic hyphen method in the sylly package.

Usage

# S4 method for kRp.text
hyphen(
  words,
  hyph.pattern = NULL,
  min.length = 4,
  rm.hyph = TRUE,
  corp.rm.class = "nonpunct",
  corp.rm.tag = c(),
  quiet = FALSE,
  cache = TRUE,
  as = "kRp.hyphen",
  as.feature = FALSE
)

# S4 method for kRp.text hyphen_df( words, hyph.pattern = NULL, min.length = 4, rm.hyph = TRUE, quiet = FALSE, cache = TRUE )

# S4 method for kRp.text hyphen_c( words, hyph.pattern = NULL, min.length = 4, rm.hyph = TRUE, quiet = FALSE, cache = TRUE )

Arguments

words

Either an object of class kRp.text, or a character vector with words to be hyphenated.

hyph.pattern

Either an object of class kRp.hyph.pat, or a valid character string naming the language of the patterns to be used. See details.

min.length

Integer, number of letters a word must have for considering a hyphenation. hyphen will not split words after the first or before the last letter, so values smaller than 4 are not useful.

rm.hyph

Logical, whether appearing hyphens in words should be removed before pattern matching.

corp.rm.class

A character vector with word classes which should be ignored. The default value "nonpunct" has special meaning and will cause the result of kRp.POS.tags(lang, tags=c("punct","sentc"), list.classes=TRUE) to be used. Relevant only if words is a valid koRpus object.

corp.rm.tag

A character vector with POS tags which should be ignored. Relevant only if words is a valid koRpus object.

quiet

Logical. If FALSE, short status messages will be shown.

cache

Logical. hyphen() can cache results to speed up the process. If this option is set to TRUE, the current cache will be queried and new tokens also be added. Caches are language-specific and reside in an environment, i.e., they are cleaned at the end of a session. If you want to save these for later use, see the option hyph.cache.file in set.kRp.env.

as

A character string defining the class of the object to be returned. Defaults to "kRp.hyphen", but can also be set to "data.frame" or "numeric", returning only the central data.frame or the numeric vector of counted syllables, respectively. For the latter two options, you can alternatively use the shortcut methods hyphen_df or hyphen_c. Ignored if as.feature=TRUE.

as.feature

Logical, whether the output should be just the analysis results or the input object with the results added as a feature. Use corpusHyphen to get the results from such an aggregated object. If set to TRUE, as="kRp.hyphen" is automatically set, overwriting other setting of as with a warning.

Value

An object of class kRp.text, kRp.hyphen, data.frame or a numeric vector, depending on the values of the as and as.feature arguments.

References

Liang, F.M. (1983). Word Hy-phen-a-tion by Com-put-er. Dissertation, Stanford University, Dept. of Computer Science.

[1] http://tug.ctan.org/tex-archive/language/hyph-utf8/tex/generic/hyph-utf8/patterns/

[2] http://www.ctan.org/tex-archive/macros/latex/base/lppl.txt

See Also

read.hyph.pat, manage.hyph.pat

Examples

Run this code
# NOT RUN {
hyphen(tagged.text)
# }

Run the code above in your browser using DataLab