Learn R Programming

keras (version 2.0.9)

text_hashing_trick: Converts a text to a sequence of indexes in a fixed-size hashing space.

Description

Converts a text to a sequence of indexes in a fixed-size hashing space.

Usage

text_hashing_trick(text, n, hash_function = NULL,
  filters = "!\"#$%&()*+,-./:;<=>?@[\\]^_`{|}~\t\n", lower = TRUE,
  split = " ")

Arguments

text

Input text (string).

n

Dimension of the hashing space.

hash_function

if NULL uses python hash function, can be 'md5' or any function that takes in input a string and returns a int. Note that hash is not a stable hashing function, so it is not consistent across different runs, while 'md5' is a stable hashing function.

filters

Sequence of characters to filter out.

lower

Whether to convert the input to lowercase.

split

Sentence split marker (string).

Value

A list of integer word indices (unicity non-guaranteed).

Details

Two or more words may be assigned to the same index, due to possible collisions by the hashing function.

See Also

Other text preprocessing: make_sampling_table, pad_sequences, skipgrams, text_one_hot, text_to_word_sequence