These methods implement word hyphenation, based on Liang's algorithm.
hyphen(words, ...)# S4 method for character
hyphen(
words,
hyph.pattern = NULL,
min.length = 4,
rm.hyph = TRUE,
quiet = FALSE,
cache = TRUE,
as = "kRp.hyphen"
)
hyphen_df(words, ...)
# S4 method for character
hyphen_df(
words,
hyph.pattern = NULL,
min.length = 4,
rm.hyph = TRUE,
quiet = FALSE,
cache = TRUE
)
hyphen_c(words, ...)
# S4 method for character
hyphen_c(
words,
hyph.pattern = NULL,
min.length = 4,
rm.hyph = TRUE,
quiet = FALSE,
cache = TRUE
)
Either a character vector with words/tokens to be hyphenated,
or any tagged text object generated with the koRpus
package.
Only used for the method generic.
Either an object of class kRp.hyph.pat
,
or
a valid character string naming the language of the patterns to be used (must already be loaded,
see details).
Integer,
number of letters a word must have for considering a hyphenation. hyphen
will
not split words after the first or before the last letter,
so values smaller than 4 are not useful.
Logical, whether appearing hyphens in words should be removed before pattern matching.
Logical. If FALSE
, short status messages will be shown.
Logical. hyphen()
can cache results to speed up the process. If this option is set to TRUE
,
the
current cache will be queried and new tokens also be added. Caches are language-specific and reside in an environment,
i.e., they are cleaned at the end of a session. If you want to save these for later use,
see the option hyph.cache.file
in set.sylly.env
.
A character string defining the class of the object to be returned. Defaults to "kRp.hyphen"
,
but can also be
set to "data.frame"
or "numeric"
,
returning only the central data.frame
or the numeric vector of counted syllables,
respectively. For the latter two options,
you can alternatively use the shortcut methods hyphen_df
or hyphen_c
.
An object of class kRp.hyphen
,
data.frame
or a numeric vector, depending on the value
of the as
argument.
For this to work the function must be told which pattern set it should use to
find the right hyphenation spots. The most straight forward way to add support
for a particular language during a session is to load an appropriate language
package (e.g., the package sylly.en
for English or sylly.de
for German).
See available.sylly.lang
and
install.sylly.lang
for more informatin on how
to get language support packages.
After such a package was loaded, you can simply use the language abbreviation as
the value for the hyph.pattern
argument (like "en"
for the English
pattern set). If words
is an object that was tokenized and tagged with
the koRpus
package, its language definition can be used instead, i.e. you
don't need to specify hyph.pattern
, hyphen
will pick the language
automatically.
In case you'd rather use your own pattern set, hyph.pattern
can be an
object of class kRp.hyph.pat
, alternatively.
Liang, F.M. (1983). Word Hy-phen-a-tion by Com-put-er. Dissertation, Stanford University, Dept. of Computer Science.
read.hyph.pat
,
manage.hyph.pat
,
available.sylly.lang
, and
install.sylly.lang
# NOT RUN {
library(sylly.en)
sampleText <- c("This", "is", "a", "rather", "stupid", "demonstration")
hyphen(sampleText, hyph.pattern="en")
hyphen_df(sampleText, hyph.pattern="en")
hyphen_c(sampleText, hyph.pattern="en")
# using a koRpus object
hyphen(tagged.text)
# }
Run the code above in your browser using DataLab