qdap (version 1.3.5)

word_associate: Find Associated Words


Find words associated with a given word(s) or a phrase(s). Results can be output as a network graph and/or wordcloud.


word_associate(text.var, grouping.var = NULL, match.string,
  text.unit = "sentence", extra.terms = NULL, target.exclude = NULL,
  stopwords = NULL, network.plot = FALSE, wordcloud = FALSE,
  cloud.colors = c("black", "gray55"), title.color = "blue",
  nw.label.cex = 0.8, title.padj = -4.5, nw.label.colors = NULL,
  nw.layout = NULL, nw.edge.color = "gray90",
  nw.label.proportional = TRUE, nw.title.padj = NULL,
  nw.title.location = NULL, title.font = NULL, title.cex = NULL,
  nw.edge.curved = TRUE, cloud.legend = NULL, cloud.legend.cex = 0.8,
  cloud.legend.location = c(-0.03, 1.03), nw.legend = NULL,
  nw.legend.cex = 0.8, nw.legend.location = c(-1.54, 1.41),
  legend.override = FALSE, char2space = "~~", ...)


The text variable.
The grouping variables. Default NULL generates one word list for all text. Also takes a single grouping variable or a list of 1 or more grouping variables.
A list of vectors or vector of terms to associate in the text.
The text unit (either "sentence" or "tot". This argument determines what unit to find the match string words within. For example if "sentence" is chosen the function pulls all text for sentences the match
Other terms to color beyond the match string.
A vector of words to exclude from the match.string.
Words to exclude from the analysis.
logical. If TRUE plots a network plot of the words.
logical. If TRUE plots a wordcloud plot of the words.
A vector of colors equal to the length of match.string +1.
A character vector of length one corresponding to the color of the title.
The magnification to be used for network plot labels relative to the current setting of cex. Default is .8.
Adjustment for the title. For strings parallel to the axes, padj = 0 means right or top alignment, and padj = 1 means left or bottom alignment.
A vector of colors equal to the length of match.string +1.
layout types supported by igraph. See layout.
A character vector of length one corresponding to the color of the plot edges.
logical. If TRUE scales the network plots across grouping.var to allow plot to plot comparisons.
Adjustment for the network plot title. For strings parallel to the axes, padj = 0 means right or top alignment, and padj = 1 means left or bottom alignment.
On which side of the network plot (1=bottom, 2=left, 3=top, 4=right).
The font family of the cloud title.
Character expansion factor for the title. NULL and NA are equivalent to 1.0.
logical. If TRUE edges will be curved rather than straight paths.
A character vector of names corresponding to the number of vectors in match.string. Both nw.legend and cloud.legend can be set separately; or one may be set and by default the other will assume those leg
Character expansion factor for the wordcloud legend. NULL and NA are equivalent to 1.0.
The x and y co-ordinates to be used to position the wordcloud legend. The location may also be specified by setting x to a single keyword from the list "bottomright", "bottom", "bottomleft", "left"
A character vector of names corresponding to the number of vectors in match.string. Both nw.legend and cloud.legend can be set separately; or one may be set and by default the other will assume those leg
Character expansion factor for the network plot legend. NULL and NA are equivalent to 1.0.
The x and y co-ordinates to be used to position the network plot legend. The location may also be specified by setting x to a single keyword from the list "bottomright", "bottom", "bottomleft", "le
By default if legend labels are supplied to either cloud.legend or nw.legend may be set and if the other remains NULL it will assume the supplied vector to the previous legend argument. If this behavior
Currently a road to nowhere. Eventually this will allow the retention of characters as is allowed in trans_cloud already.
Other arguments supplied to trans_cloud.


  • Returns a list:
  • word frequency matricesWord frequency matrices for each grouping variable.
  • dialogueA list of dataframes for each word list (each vector supplied to match.string) and a final dataframe of all combined text units that contain any match string.
  • match.termsA list of vectors of word lists (each vector supplied to match.string).
  • Optionally, returns a word cloud and/or a network plot of the text unit containing the match.string terms.

See Also

trans_cloud, word_network_plot, wordcloud, graph.adjacency


Run this code
ms <- c(" I ", "you")
et <- c(" it", " tell", "tru")
out1 <- word_associate(DATA2$state, DATA2$person, match.string = ms,
    wordcloud = TRUE,  proportional = TRUE,
    network.plot = TRUE,  nw.label.proportional = TRUE, extra.terms = et,
    cloud.legend =c("A", "B", "C"),
    title.color = "blue", cloud.colors = c("red", "purple", "gray70"))

#Note: You don't have to name the vectors in the lists but I do for clarity
ms <- list(
    list1 = c(" I ", " you", "not"),
    list2 = c(" wh")

et <- list(
    B = c(" the", "do", "tru"),
    C = c(" it", " already", "we")

out2 <- word_associate(DATA2$state, DATA2$person, match.string = ms,
    wordcloud = TRUE,  proportional = TRUE,
    network.plot = TRUE,  nw.label.proportional = TRUE, extra.terms = et,
    cloud.legend =c("A", "B", "C", "D"),
    title.color = "blue", cloud.colors = c("red", "blue", "purple", "gray70"))

out3 <- word_associate(DATA2$state, list(DATA2$day, DATA2$person), match.string = ms)

m <- list(
    A1 = c("you", "in"), #list 1
    A2 = c(" wh")        #list 2

n <- list(
    B = c(" the", " on"),
    C = c(" it", " no")

out4 <- word_associate(DATA2$state, list(DATA2$day, DATA2$person),
    match.string = m)
out5 <- word_associate(raj.act.1$dialogue, list(raj.act.1$person),
    match.string = m)
out6 <- with(mraja1spl, word_associate(dialogue, list(fam.aff, sex),
     match.string = m))
lapply(out6$dialogue, htruncdf, n = 20, w = 20)

DATA2$state2 <- space_fill(DATA2$state, c("is fun", "too fun"))

ms <- list(
    list1 = c(" I ", " you", "is fun", "too fun"),
    list2 = c(" wh")

et <- list(
    B = c(" the", " on"),
    C = c(" it", " no")

out7 <- word_associate(DATA2$state2, DATA2$person, match.string = ms,
    wordcloud = TRUE,  proportional = TRUE,
    network.plot = TRUE,  nw.label.proportional = TRUE, extra.terms = et,
    cloud.legend =c("A", "B", "C", "D"),
    title.color = "blue", cloud.colors = c("red", "blue", "purple", "gray70"))

DATA2 <- qdap::DATA2

