plotFreq: Plotting Counts of specified Wordgroups over Time (relative to Corpus)

Description

Creates a plot of the counts/proportion of given wordgroups (wordlist) in the subcorpus. The counts/proportion can be calculated on document or word level - with an 'and' or 'or' link - and additionally can be normalised by a subcorporus, which could be specified by id.

Usage

plotFreq(
  object,
  id = names(object$text),
  type = c("docs", "words"),
  wordlist,
  link = c("and", "or"),
  wnames,
  ignore.case = FALSE,
  rel = FALSE,
  mark = TRUE,
  unit = "month",
  curves = c("exact", "smooth", "both"),
  smooth = 0.05,
  both.lwd,
  both.lty,
  main,
  xlab,
  ylab,
  ylim,
  col,
  legend = "topright",
  natozero = TRUE,
  file,
  ...
)

Value

A plot. Invisible: A dataframe with columns date and wnames - and additionally columns wnames_rel for rel = TRUE - with the counts (and proportion) of the given wordgroups.

Arguments

object: textmeta object with strictly tokenized text component (character vectors) - like a result of cleanTexts
id: character vector (default: object$meta$id) which IDs specify the subcorpus
type: character (default: "docs") should counts/proportion of documents, where every "docs" or words "words" be plotted
wordlist: list of character vectors. Every list element is an 'or' link, every character string in a vector is linked by the argument link. If wordlist is only a character vector it will be coerced to a list of the same length as the vector (see as.list), so that the argument link has no effect. Each character vector as a list element represents one curve in the outcoming plot
link: character (default: "and") should the (inner) character vectors of each list element be linked by an "and" or an "or"
wnames: character vector of same length as wordlist - labels for every group of 'and' linked words
ignore.case: logical (default: FALSE) option from grepl.
rel: logical (default: FALSE) should counts (FALSE) or proportion (TRUE) be plotted
mark: logical (default: TRUE) should years be marked by vertical lines
unit: character (default: "month") to which unit should dates be floored. Other possible units are "bimonth", "quarter", "season", "halfyear", "year", for more units see round_date
curves: character (default: "exact") should "exact", "smooth" curve or "both" be plotted
smooth: numeric (default: 0.05) smoothing parameter which is handed over to lowess as f
both.lwd: graphical parameter for smoothed values if curves = "both"
both.lty: graphical parameter for smoothed values if curves = "both"
main: character graphical parameter
xlab: character graphical parameter
ylab: character graphical parameter
ylim: (default if rel = TRUE: c(0, 1)) graphical parameter
col: graphical parameter, could be a vector. If curves = "both" the function will for every wordgroup plot at first the exact and then the smoothed curve - this is important for your col order.
legend: character (default: "topright") value(s) to specify the legend coordinates. If "none" no legend is plotted.
natozero: logical (default: TRUE) should NAs be coerced to zeros. Only has effect if rel = TRUE.
file: character file path if a pdf should be created
...: additional graphical parameters

Examples

Run this code

if (FALSE) {
data(politics)
poliClean <- cleanTexts(politics)
plotFreq(poliClean, wordlist=c("obama", "bush"))
}

Run the code above in your browser using DataLab