One-hot encode a text into a list of word indexes in a vocabulary of size n.
text_one_hot(
input_text,
n,
filters = "!\"#$%&()*+,-./:;<=>?@[\\]^_`{|}~\t\n",
lower = TRUE,
split = " ",
text = NULL
)=>
List of integers in [1, n]
. Each integer encodes a word (unicity
non-guaranteed).
Input text (string).
Size of vocabulary (integer)
Sequence of characters to filter out such as punctuation. Default includes basic punctuation, tabs, and newlines.
Whether to convert the input to lowercase.
Sentence split marker (string).
for compatibility purpose. use input_text
instead.
Other text preprocessing:
make_sampling_table()
,
pad_sequences()
,
skipgrams()
,
text_hashing_trick()
,
text_to_word_sequence()