centroid_analysis: Centroid Analysis

Description

Performs a centroid analysis for a set of words

Usage

centroid_analysis(responses,targets = NULL,split=" ",unique.responses = FALSE,
reference.list = NULL,verbose = FALSE,rank.responses = FALSE,
tvectors=tvectors)

Value

An object of class centroid_analysis. This object is a list consisting of:

$centroid: The centroid of the response vectors
$cosines: The cosine similarity between the response centroid and each target vector
$ranks.target: The rank of the response centroid in the neighborhood of each target vector, with reference to reference.list
$ranks.centroid: The rank of each target in the neighborhood of the response centroid, with reference to reference.list

Arguments

responses: a character vector specifying multiple single words
targets: (optional:) a character vector specifying one or multiple single words
split: a character vector defining the character used to split the input strings into individual words (white space by default)
unique.responses: If TRUE, duplicated words in responses are discarded when computing the the centroid. FALSE by default, so multiple instances of the same word will be included.
reference.list: (optional:) A list of words in reference to which the neighborhood ranks are computed: Only entries in reference.list will be considered as possible neighbors. Only relevant when target words are provided in target. if reference.list = NULL (default), then rownames(tvectors) (all words in the semantic space) will be considered when computing ranks.
verbose: If TRUE (default: FALSE), a message will appear that specifies for which target the neighborhood ranks are currently being computed
rank.responses: If FALSE (default), responses themselves will not be considered for computing the neighborhood rank.
tvectors: the semantic space in which the computation is to be done (a numeric matrix where every row is a word vector)

Author

Fritz Guenther, Aliona Petrenco

Details

The centroid analysis computes the average vector for a set of words. The intended use case is that these words are responses towards a given concept; the centroid then serves as the estimated vector representation for that concept.

References

Pugacheva, V., & GÃ¼nther, F. (2024). Lexical choice and word formation in a taboo game paradigm. Journal of Memory and Language, 135, 104477.

Examples

Run this code

data(wonderland)
centroid_analysis(responses=c("mouse","rabbit","cat","king","queen"),targets=c("alice","hare"),
          tvectors=wonderland)

Run the code above in your browser using DataLab