createDict

trainvec

The path of output file. Defult is NULL.

dicfile

Character containing regular expression to use for splitting words.

wordsplit

Character containing regular expression to use for splitting nature.

natruesplit

Read a corpus vector and generate the dictionary data frame.

Provides interfaces and useful tools for Chinese word segmentation. Implements a segmentation algorithm based on Hidden Markov Model (HMM) in native R codes. Methods for HHMM-Based Chinese lexical analyzer are as described in : Hua-Ping Zhang et al., (2003) <doi:10.3115/1119250.1119280>.

createDict: Create a dictionary file from corpus.

Description

Usage

Arguments

Value

Examples