Learn R Programming

jiebaR (version 0.11)

segment: Chinese text segmentation function

Description

The function uses initialized engines for words segmentation. You can initialize multiple engines simultaneously using worker(). Public settings of workers can be got and modified using $, such as WorkerName$symbol = T . Some private settings are fixed when engine is initialized, and you can get then by WorkerName$PrivateVarible.

Usage

segment(code, jiebar, mod = NULL)

Arguments

code

A Chinese sentence or the path of a text file.

jiebar

jiebaR Worker.

mod

change default result type, value can be "mix","hmm","query","full" or "mp"

Details

There are four kinds of models:

Maximum probability segmentation model uses Trie tree to construct a directed acyclic graph and uses dynamic programming algorithm. It is the core segmentation algorithm. dict and user should be provided when initializing jiebaR worker.

Hidden Markov Model uses HMM model to determine status set and observed set of words. The default HMM model is based on People's Daily language library. hmm should be provided when initializing jiebaR worker.

MixSegment model uses both Maximum probability segmentation model and Hidden Markov Model to construct segmentation. dict, hmm and user should be provided when initializing jiebaR worker.

QuerySegment model uses MixSegment to construct segmentation and then enumerates all the possible long words in the dictionary. dict, hmm and qmax should be provided when initializing jiebaR worker.

There is a symbol <= for this function.

See Also

<=.segment worker