ggram: Plot n-gram frequencies

Description

ggram downloads data from the Google Ngram Viewer website and plots it in ggplot2 style.

Usage

ggram(
  phrases,
  ignore_case = FALSE,
  code_corpus = FALSE,
  geom = "line",
  geom_options = list(),
  lab = NA,
  google_theme = FALSE,
  ...
)

Arguments

phrases

vector of phrases. Alternatively, phrases can be an ngram object returned by ngram or ngrami.

ignore_case

logical, indicating whether the frequencies are case insensitive. Default is FALSE.

code_corpus

logical, indicating whether to use abbreviated corpus `codes or longer form descriptions. Default is FALSE.

geom

the ggplot2 geom used to plot the data; defaults to "line"

geom_options

list of additional parameters passed to the ggplot2 geom.

lab

y-axis label. Defaults to "Frequency".

google_theme

use a Google Ngram-style plot theme.

...

additional parameters passed to ngram

Details

Google generated two datasets drawn from digitised books in the Google books collection. One was generated in July 2009, the second in July 2012. Google will update these datasets as book scanning continues.

Examples

Run this code

# NOT RUN {
library(ggplot2)
# }
# NOT RUN {
ggram(c("hacker", "programmer"), year_start = 1950)

# Changing the geom.
ggram(c("cancer", "fumer", "cigarette"),
      year_start = 1900,
      corpus = "fre_2012",
      smoothing = 0,
      geom = "step")

# Passing more options.
ggram(c("cancer", "smoking", "tobacco"),
      year_start = 1900,
      corpus = "eng_fiction_2012",
      geom = "point",
      smoothing = 0,
      geom_options = list(alpha = .5)) +
  stat_smooth(method="loess", se = FALSE, formula = y  ~ x)

# Setting the layers manually.
ggram(c("cancer", "smoking", "tobacco"),
      year_start = 1900,
      corpus = "eng_fiction_2012",
      smoothing = 0,
      geom = NULL) +
  stat_smooth(method="loess", se=FALSE, span = 0.3, formula = y ~ x)

# Setting the legend placement on a long query and using the Google theme.
# Example taken from a post by Ben Zimmer at Language Log.
p <- c("((The United States is + The United States has) / The United States)",
      "((The United States are + The United States have) / The United States)")
ggram(p, year_start = 1800, google_theme = TRUE) +
      theme(legend.direction="vertical")
# }
# NOT RUN {
# Pass ngram data rather than phrases
ggram(hacker) + facet_wrap(~ Corpus)
# }

Run the code above in your browser using DataLab