Trains a word2vec model on a String column of an H2O data frame.
h2o.word2vec(training_frame = NULL, model_id = NULL, min_word_freq = 5,
word_model = c("SkipGram"), norm_model = c("HSM"), vec_size = 100,
window_size = 5, sent_sample_rate = 0.001, init_learning_rate = 0.025,
epochs = 5, pre_trained = NULL)
Id of the training data frame (Not required, to allow initial validation of model parameters).
Destination id for this model; auto-generated if not specified.
This will discard words that appear less than <int> times Defaults to 5.
Use the Skip-Gram model Must be one of: "SkipGram". Defaults to SkipGram.
Use Hierarchical Softmax Must be one of: "HSM". Defaults to HSM.
Set size of word vectors Defaults to 100.
Set max skip length between words Defaults to 5.
Set threshold for occurrence of words. Those that appear with higher frequency in the training data will be randomly down-sampled; useful range is (0, 1e-5) Defaults to 0.001.
Set the starting learning rate Defaults to 0.025.
Number of training iterations to run Defaults to 5.
Id of a data frame that contains a pre-trained (external) word2vec model