ggram: Plot n-gram frequencies

Description

ggram downloads data from the Google Ngram Viewer website and plots it in ggplot2 style.

Usage

ggram(
  phrases,
  ignore_case = FALSE,
  code_corpus = FALSE,
  geom = "line",
  geom_options = list(),
  lab = NA,
  google_theme = FALSE,
  ...
)

Arguments

phrases: vector of phrases. Alternatively, phrases can be an ngram object returned by ngram or ngrami.
ignore_case: logical, indicating whether the frequencies are case insensitive. Default is FALSE.
code_corpus: logical, indicating whether to use abbreviated corpus `codes or longer form descriptions. Default is FALSE.
geom: the ggplot2 geom used to plot the data; defaults to "line"
geom_options: list of additional parameters passed to the ggplot2 geom.
lab: y-axis label. Defaults to "Frequency".
google_theme: use a Google Ngram-style plot theme.
...: additional parameters passed to ngram

Details

Google generated two datasets drawn from digitised books in the Google books collection. One was generated in July 2009, the second in July 2012. Google will update these datasets as book scanning continues.

Examples

Run this code

library(ggplot2)
ggram(c("hacker", "programmer"), year_start = 1950)

# Changing the geom.
ggram(c("cancer", "fumer", "cigarette"),
      year_start = 1900,
      corpus = "fr-2012",
      smoothing = 0,
      geom = "step")

# Passing more options.
ggram(c("cancer", "smoking", "tobacco"),
      year_start = 1900,
      corpus = "en-fiction-2012",
      geom = "point",
      smoothing = 0,
      geom_options = list(alpha = .5)) +
  stat_smooth(method="loess", se = FALSE, formula = y  ~ x)

# Setting the layers manually.
ggram(c("cancer", "smoking", "tobacco"),
      year_start = 1900,
      corpus = "en-fiction-2012",
      smoothing = 0,
      geom = NULL) +
  stat_smooth(method="loess", se=FALSE, span = 0.3, formula = y ~ x)

# Setting the legend placement on a long query and using the Google theme.
# Example taken from a post by Ben Zimmer at Language Log.
p <- c("((The United States is + The United States has) / The United States)",
      "((The United States are + The United States have) / The United States)")
ggram(p, year_start = 1800, google_theme = TRUE) +
      theme(legend.direction="vertical")

# Pass ngram data rather than phrases
ggram(hacker) + facet_wrap(~ Corpus)

Run the code above in your browser using DataLab