Learn R Programming

⚠️There's a newer version (2025.11) of this package.Take me there.

PsychWordVec

Word Embedding Research Framework for Psychological Science.

An integrative toolbox of word embedding research that provides:

A collection of pre-trained static word vectors in the .RData compressed format;
A series of functions to process, analyze, and visualize word vectors;
A range of tests to examine conceptual associations, including the Word Embedding Association Test (Caliskan et al., 2017) and the Relative Norm Distance (Garg et al., 2018), with permutation test of significance;
A set of training methods to locally train (static) word vectors from text corpora, including Word2Vec (Mikolov et al., 2013), GloVe (Pennington et al., 2014), and FastText (Bojanowski et al., 2017);
A group of functions to download pre-trained language models (e.g., GPT, BERT) and extract contextualized (dynamic) word vectors (based on the R package text).

⚠️ All users should update the package to version ≥ 0.3.2. Old versions may have slow processing speed and other problems.

Author

Han-Wu-Shuang (Bruce) Bao 包寒吴霜

Copy Link

Version

Install

install.packages('PsychWordVec')

Monthly Downloads

451

Version

2023.9

License

GPL-3

Maintainer

Han-Wu-Shuang Bao

Last Published

November 30th, 2025

Functions in PsychWordVec (2023.9)

plot_wordvec_tSNE

Visualize word vectors with dimensionality reduced using t-SNE.

Visualize word vectors.

plot_similarity

Visualize cosine similarity of word pairs.

pair_similarity

Compute a matrix of cosine similarity/distance of word pairs.

Calculate the sum vector of multiple words.

Tabulate cosine similarity/distance of word pairs.

Normalize all word vectors to the unit length 1.

Objects exported from other packages

Visualize a (partial correlation) network graph of words.

orth_procrustes

Orthogonal Procrustes rotation for matrix alignment.

Tokenize raw text for training word embeddings.

Install required Python modules in a new conda environment and initialize the environment, necessary for all text_* functions designed for contextualized word embeddings.

text_model_download

Download pre-trained language models from HuggingFace.

Train static word embeddings using the Word2Vec, GloVe, or FastText algorithm.

Word Embedding Association Test (WEAT) and Single-Category WEAT.

Relative Norm Distance (RND) analysis.

<Deprecated> Fill in the blank mask(s) in a query (sentence).

text_model_remove

Remove downloaded models from the local .cache folder.

Extract contextualized word embeddings from transformers (pre-trained language models).

Transform plain text of word vectors into wordvec (data.table) or embed (matrix), saved in a compressed ".RData" file.

data_wordvec_load

Load word vectors data (wordvec or embed) from ".RData" file.

dict_reliability

Reliability analysis and PCA of a dictionary.

Expand a dictionary from the most similar words.

data_wordvec_subset

Extract a subset of word vectors data (with S3 methods).

Demo data (pre-trained using word2vec on Google News; 8000 vocab, 300 dims).

Word vectors data class: wordvec and embed.

cosine_similarity

Cosine similarity/distance between two vectors.

Extract word vector(s).

Find the Top-N most similar words.