Learn R Programming

word2vec (version 0.4.0)

predict.word2vec: Predict functionalities for a word2vec model

Description

Get either

  • the embedding of words

  • the nearest words which are similar to either a word or a word vector

Usage

# S3 method for word2vec
predict(
  object,
  newdata,
  type = c("nearest", "embedding"),
  top_n = 10L,
  encoding = "UTF-8",
  ...
)

Value

depending on the type, you get a different result back:

  • for type nearest: a list of data.frames with columns term, similarity and rank indicating with words which are closest to the provided newdata words or word vectors. If newdata is just one vector instead of a matrix, it returns a data.frame

  • for type embedding: a matrix of word vectors of the words provided in newdata

Arguments

object

a word2vec model as returned by word2vec or read.word2vec

newdata

for type 'embedding', newdata should be a character vector of words
for type 'nearest', newdata should be a character vector of words or a matrix in the embedding space

type

either 'embedding' or 'nearest'. Defaults to 'nearest'.

top_n

show only the top n nearest neighbours. Defaults to 10.

encoding

set the encoding of the text elements to the specified encoding. Defaults to 'UTF-8'.

...

not used

See Also

word2vec, read.word2vec

Examples

Run this code
path  <- system.file(package = "word2vec", "models", "example.bin")
model <- read.word2vec(path)
emb <- predict(model, c("bus", "toilet", "unknownword"), type = "embedding")
emb
nn  <- predict(model, c("bus", "toilet"), type = "nearest", top_n = 5)
nn

# Do some calculations with the vectors and find similar terms to these
emb <- as.matrix(model)
vector <- emb["buurt", ] - emb["rustige", ] + emb["restaurants", ]
predict(model, vector, type = "nearest", top_n = 10)

vector <- emb["gastvrouw", ] - emb["gastvrij", ]
predict(model, vector, type = "nearest", top_n = 5)

vectors <- emb[c("gastheer", "gastvrouw"), ]
vectors <- rbind(vectors, avg = colMeans(vectors))
predict(model, vectors, type = "nearest", top_n = 10)

Run the code above in your browser using DataLab