This function helps remove some words in the segmentation result.
Usage
filter_segment(input, filter_words, unit = 50)
Arguments
input
a string vector
filter_words
a string vector of words to be removed.
unit
the length of word unit to use in regular expression,
and the default is 50. Long list of a words forms a big regular expressions,
it may or may not be accepted: the POSIX standard only requires up to 256
bytes. So we use unit to split the words in units.