powered by
A function to segment Chinese text into words.
segmentCN(strwords, analyzer = c("default", "hmm", "jiebaR", "fmm", "coreNLP"), nature = FALSE, nosymbol = TRUE, returnType = c("vector", "tm"), ...)
A charactor vector of Chinese sentence.
One of 'default', 'jiebaR', 'hmm', 'fmm' and 'coreNLP'. Default is 'hmm'.
Whether to recognise the nature of the words.
Whether to keep symbols in the sentence. Default is TRUE, means no symbols kept.
Default is a string vector but we also can choose 'tm' to output a single string separated by space so that it can be used by Corpus directly.
Corpus
Other arguments.
a vector of words (list if input is vecter) which have been segmented.
# NOT RUN { segmentCN("hello world!") # }
Run the code above in your browser using DataLab