DEPRECIATED.This function trains a GloVe word-embeddings model via fully asynchronous and parallel AdaGrad.
glove(tcm, vocabulary_size = nrow(tcm), word_vectors_size, x_max, num_iters,
shuffle_seed = NA_integer_, learning_rate = 0.05,
convergence_threshold = -1, grain_size = 100000L, alpha = 0.75, ...)
an object which represents a term-co-occurrence matrix, which is
used in training. At the moment only dgTMatrix
or objects coercible
to a dgTMatrix
) are supported. In future releases we will add
support for out-of-core learning and streaming a TCM from disk.
number of words in in the term-co-occurrence matrix
desired dimension for word vectors
maximum number of co-occurrences to use in the weighting function. See the GloVe paper for details: http://nlp.stanford.edu/pubs/glove.pdf.
number of AdaGrad epochs
integer
seed. Use NA_integer_
to turn
shuffling off. A seed defines shuffling before each SGD iteration.
Parameter only controls shuffling before each SGD iteration. Result still
will be unpredictable (because of Hogwild style async SGD)!
Generally shuffling is a good idea for stochastic-gradient descent, but
from my experience in this particular case it does not improve convergence.
By default there is no shuffling. Please report if you find that shuffling
improves your score.
learning rate for SGD. I do not recommend that you modify this parameter, since AdaGrad will quickly adjust it to optimal.
defines early stopping strategy. We stop fitting
when one of two following conditions will be satisfied: (a) we have used
all iterations, or (b) cost_previous_iter / cost_current_iter - 1 <
convergence_threshold
.
I do not recommend adjusting this parameter. This is the
grain_size for RcppParallel::parallelReduce
. For details, see
http://rcppcore.github.io/RcppParallel/#grain-size.
the alpha in weighting function formula : \(f(x) = 1 if x > x_max; else (x/x_max)^alpha\)
arguments passed to other methods (not used at the moment).