word_proximity: Proximity Matrix Between Words

Description

Generate proximity measures to ascertain a mean distance measure between word uses. Weight a word_proximity object. word_proximity Method for weight

Usage

word_proximity(text.var, terms, grouping.var = NULL, parallel = TRUE,
  cores = parallel::detectCores()/2)

weight(x, type = "scale", ...)

## S3 method for class 'word_proximity':
weight(x, type = "scale", ...)

Arguments

text.var

The text variable.

terms

A vector of quoted terms.

grouping.var

The grouping variables. Default NULL generates one word list for all text. Also takes a single grouping variable or a list of 1 or more grouping variables.

parallel

logical. If TRUE attempts to run the function on multiple cores. Note that this may not mean a speed boost if you have one core or if the data set is smaller as the cluster takes time to create.

cores

The number of cores to use if

parallel
  = TRUE

. Default is half the number of available cores.

An object to be weighted.

type

A weighting type of: c("scale_log", "scale", "rev_scale", "rev_scale_log", "log", "sqrt", "scale_sqrt", "rev_sqrt", "rev_scale_sqrt"<

...

ignored.

Value

Returns a list of matrices of proximity measures in the unit of average sentences between words (defaults to scaled). Returns a weighted list of matrices.

Details

Note that row names are the first word and column names are the second comparison word. The values for Word A compared to Word B will not be the same as Word B compared to Word A. This is because, unlike a true distance measure, word_proximity's matrix is asymmetrical. word_proximity computes the distance by taking each sentence position for Word A and comparing it to the nearest sentence location for Word B.

Examples

Run this code

wrds <- word_list(pres_debates2012$dialogue,
    stopwords = c("it's", "that's", Top200Words))
wrds2 <- tolower(sort(wrds$rfswl[[1]][, 1]))

(x <- with(pres_debates2012, word_proximity(dialogue, wrds2)))
plot(x)
plot(weight(x))
plot(weight(x, "rev_scale_log"))

(x2 <- with(pres_debates2012, word_proximity(dialogue, wrds2, person)))

## The spaces around `terms` are important
(x3 <- with(DATA, word_proximity(state, spaste(qcv(the, i)))))
(x4 <- with(DATA, word_proximity(state, qcv(the, i))))