The calculations are done with the text2vec package.
glove(
text,
tokenizer = text2vec::space_tokenizer,
dim = 10L,
window = 5L,
min_count = 5L,
n_iter = 10L,
x_max = 10L,
stopwords = character(),
convergence_tol = -1,
threads = 1,
composition = c("tibble", "data.frame", "matrix"),
verbose = FALSE
)
Character string.
Function, function to perform tokenization. Defaults to text2vec::space_tokenizer.
Integer, number of dimension of the resulting word vectors.
Integer, skip length between words. Defaults to 5.
Integer, number of times a token should appear to be considered in the model. Defaults to 5.
Integer, number of training iterations. Defaults to 10.
Integer, maximum number of co-occurrences to use in the weighting function. Defaults to 10.
Character, a vector of stop words to exclude from training.
Numeric, value determining the convergence criteria.
numeric = -1
defines early stopping strategy. Stop fitting
when one of two following conditions will be satisfied: (a) passed
all iterations (b) cost_previous_iter / cost_current_iter - 1 <
convergence_tol
. Defaults to -1.
number of CPU threads to use. Defaults to 1.
Character, Either "tibble", "matrix", or "data.frame" for the format out the resulting word vectors.
Logical, controls whether progress is reported as operations are executed.
A tibble, data.frame or matrix containing the token in the first column and word vectors in the remaining columns.
Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. GloVe: Global Vectors for Word Representation.
# NOT RUN {
glove(fairy_tales, x_max = 5)
# }
Run the code above in your browser using DataLab