Learn R Programming

⚠️There's a newer version (2025.3) of this package.Take me there.

PsychWordVec

Word Embedding Research Framework for Psychological Science.

An integrative toolbox of word embedding research that provides:

  1. A collection of pre-trained static word vectors in the .RData compressed format;
  2. A series of functions to process, analyze, and visualize word vectors;
  3. A range of tests to examine conceptual associations, including the Word Embedding Association Test (Caliskan et al., 2017) and the Relative Norm Distance (Garg et al., 2018), with permutation test of significance;
  4. A set of training methods to locally train (static) word vectors from text corpora, including Word2Vec (Mikolov et al., 2013), GloVe (Pennington et al., 2014), and FastText (Bojanowski et al., 2017);
  5. A group of functions to download pre-trained language models (e.g., GPT, BERT) and extract contextualized (dynamic) word vectors (based on the R package text).

⚠️ All users should update the package to version ≥ 0.3.2. Old versions may have slow processing speed and other problems.

Author

Han-Wu-Shuang (Bruce) Bao 包寒吴霜

Copy Link

Version

Install

install.packages('PsychWordVec')

Monthly Downloads

425

Version

2023.9

License

GPL-3

Maintainer

Han-Wu-Shuang Bao

Last Published

March 30th, 2025

Functions in PsychWordVec (2023.9)

plot_wordvec_tSNE

Visualize word vectors with dimensionality reduced using t-SNE.
plot_wordvec

Visualize word vectors.
plot_similarity

Visualize cosine similarity of word pairs.
pair_similarity

Compute a matrix of cosine similarity/distance of word pairs.
sum_wordvec

Calculate the sum vector of multiple words.
tab_similarity

Tabulate cosine similarity/distance of word pairs.
normalize

Normalize all word vectors to the unit length 1.
reexports

Objects exported from other packages
plot_network

Visualize a (partial correlation) network graph of words.
orth_procrustes

Orthogonal Procrustes rotation for matrix alignment.
tokenize

Tokenize raw text for training word embeddings.
text_init

Install required Python modules in a new conda environment and initialize the environment, necessary for all text_* functions designed for contextualized word embeddings.
text_model_download

Download pre-trained language models from HuggingFace.
train_wordvec

Train static word embeddings using the Word2Vec, GloVe, or FastText algorithm.
test_WEAT

Word Embedding Association Test (WEAT) and Single-Category WEAT.
test_RND

Relative Norm Distance (RND) analysis.
text_unmask

<Deprecated> Fill in the blank mask(s) in a query (sentence).
text_model_remove

Remove downloaded models from the local .cache folder.
text_to_vec

Extract contextualized word embeddings from transformers (pre-trained language models).
data_transform

Transform plain text of word vectors into wordvec (data.table) or embed (matrix), saved in a compressed ".RData" file.
data_wordvec_load

Load word vectors data (wordvec or embed) from ".RData" file.
dict_reliability

Reliability analysis and PCA of a dictionary.
dict_expand

Expand a dictionary from the most similar words.
data_wordvec_subset

Extract a subset of word vectors data (with S3 methods).
demodata

Demo data (pre-trained using word2vec on Google News; 8000 vocab, 300 dims).
as_embed

Word vectors data class: wordvec and embed.
cosine_similarity

Cosine similarity/distance between two vectors.
get_wordvec

Extract word vector(s).
most_similar

Find the Top-N most similar words.