The package uses initialized engines for word segmentation, and you
can initialize multiple engines simultaneously. You can also reset the model
public settings using $
such as
WorkerName$symbol = T
. Some private settings are fixed
when a engine is initialized, and you can get then by
WorkerName$PrivateVarible
.
Maximum probability segmentation model uses Trie tree to construct
a directed acyclic graph and uses dynamic programming algorithm. It
is the core segmentation algorithm. dict
and user
should be provided when initializing jiebaR worker.
Hidden Markov Model uses HMM model to determine status set and
observed set of words. The default HMM model is based on People's Daily
language library. hmm
should be provided when initializing
jiebaR worker.
MixSegment model uses both Maximum probability segmentation model
and Hidden Markov Model to construct segmentation. dict
hmm
and user
should be provided when initializing
jiebaR worker.
QuerySegment model uses MixSegment to construct segmentation and then
enumerates all the possible long words in the dictionary. dict
,
hmm
and qmax
should be provided when initializing
jiebaR worker.
FullSegment model will enumerates all the possible words in the dictionary.
Speech Tagging worker uses MixSegment model to cut word and
tag each word after segmentation using labels compatible with
ictclas. dict
,
hmm
and user
should be provided when initializing
jiebaR worker.
Keyword Extraction worker uses MixSegment model to cut word and use
TF-IDF algorithm to find the keywords. dict
,hmm
,
idf
, stop_word
and topn
should be provided when initializing
jiebaR worker.
Simhash worker uses the keyword extraction worker to find the keywords
and uses simhash algorithm to compute simhash. dict
hmm
, idf
and stop_word
should be provided when initializing
jiebaR worker.