h2o.word2vec

Id of the training data frame (Not required, to allow initial validation of model parameters).

training_frame

Destination id for this model; auto-generated if not specified.

model_id

This will discard words that appear less than &lt;int&gt; times Defaults to 5.

min_word_freq

Use the Skip-Gram model Must be one of: "SkipGram". Defaults to SkipGram.

word_model

Use Hierarchical Softmax Must be one of: "HSM". Defaults to HSM.

norm_model

Set size of word vectors Defaults to 100.

vec_size

Set max skip length between words Defaults to 5.

window_size

Set threshold for occurrence of words. Those that appear with higher frequency in the training data
will be randomly down-sampled; useful range is (0, 1e-5) Defaults to 0.001.

sent_sample_rate

Set the starting learning rate Defaults to 0.025.

init_learning_rate

Number of training iterations to run Defaults to 5.

epochs

Id of a data frame that contains a pre-trained (external) word2vec model

pre_trained

Trains a word2vec model on a String column of an H2O data frame.

R scripting functionality for H2O, the open source
math engine for big data that computes parallel distributed
machine learning algorithms such as generalized linear models,
gradient boosting machines, random forests, and neural networks
(deep learning) within various cluster environments.

h2o.word2vec: Trains a word2vec model on a String column of an H2O data frame.

Description

Usage

Arguments