These methods implement word hyphenation, based on Liang's algorithm.
For details, please refer to the documentation for the generic
hyphen
method in the sylly
package.
# S4 method for kRp.text
hyphen(
words,
hyph.pattern = NULL,
min.length = 4,
rm.hyph = TRUE,
corp.rm.class = "nonpunct",
corp.rm.tag = c(),
quiet = FALSE,
cache = TRUE,
as = "kRp.hyphen",
as.feature = FALSE
)# S4 method for kRp.text
hyphen_df(
words,
hyph.pattern = NULL,
min.length = 4,
rm.hyph = TRUE,
quiet = FALSE,
cache = TRUE
)
# S4 method for kRp.text
hyphen_c(
words,
hyph.pattern = NULL,
min.length = 4,
rm.hyph = TRUE,
quiet = FALSE,
cache = TRUE
)
Either an object of class kRp.text
,
or a character vector with words to be hyphenated.
Either an object of class kRp.hyph.pat
,
or
a valid character string naming the language of the patterns to be used. See details.
Integer,
number of letters a word must have for considering a hyphenation. hyphen
will
not split words after the first or before the last letter,
so values smaller than 4 are not useful.
Logical, whether appearing hyphens in words should be removed before pattern matching.
A character vector with word classes which should be ignored. The default value
"nonpunct"
has special meaning and will cause the result of
kRp.POS.tags(lang, tags=c("punct","sentc"),
list.classes=TRUE)
to be used. Relevant only if words
is a valid koRpus object.
A character vector with POS tags which should be ignored. Relevant only if words
is a valid koRpus object.
Logical. If FALSE
, short status messages will be shown.
Logical. hyphen()
can cache results to speed up the process. If this option is set to TRUE
,
the
current cache will be queried and new tokens also be added. Caches are language-specific and reside in an environment,
i.e., they are cleaned at the end of a session. If you want to save these for later use,
see the option hyph.cache.file
in set.kRp.env
.
A character string defining the class of the object to be returned. Defaults to "kRp.hyphen"
,
but can also be
set to "data.frame"
or "numeric"
,
returning only the central data.frame
or the numeric vector of counted syllables,
respectively. For the latter two options,
you can alternatively use the shortcut methods hyphen_df
or hyphen_c
.
Ignored if as.feature=TRUE
.
Logical,
whether the output should be just the analysis results or the input object with
the results added as a feature. Use corpusHyphen
to get the results from such an aggregated object.
If set to TRUE
, as="kRp.hyphen"
is automatically set,
overwriting other setting of as
with a warning.
An object of class kRp.text
,
kRp.hyphen
,
data.frame
or a numeric vector,
depending on the values of the as
and as.feature
arguments.
Liang, F.M. (1983). Word Hy-phen-a-tion by Com-put-er. Dissertation, Stanford University, Dept. of Computer Science.
[1] http://tug.ctan.org/tex-archive/language/hyph-utf8/tex/generic/hyph-utf8/patterns/
[2] http://www.ctan.org/tex-archive/macros/latex/base/lppl.txt
# NOT RUN {
hyphen(tagged.text)
# }
Run the code above in your browser using DataLab