One-hot encode a text into a list of word indexes in a vocabulary of size n.
text_one_hot(
text,
n,
filters = "!\"#$%&()*+,-./:;<=>?@[\\]^_`{|}~\t\n",
lower = TRUE,
split = " "
)=>
Input text (string).
Size of vocabulary (integer)
Sequence of characters to filter out such as punctuation. Default includes basic punctuation, tabs, and newlines.
Whether to convert the input to lowercase.
Sentence split marker (string).
List of integers in [1, n]
. Each integer encodes a word (unicity
non-guaranteed).
Other text preprocessing:
make_sampling_table()
,
pad_sequences()
,
skipgrams()
,
text_hashing_trick()
,
text_to_word_sequence()