fightin_words_plot: A function that generates plots similar to those in Monroe et al. 'Fightin Words...'.

Description

A function that generates plots similar to those in Monroe et al. 'Fightin Words...'.

Usage

fightin_words_plot(feature_selection_object, title = "",
  positive_category = "Category 1", negative_category = "Category 2",
  xlab = "term count", display_top_words = 20,
  display_terms_next_to_points = FALSE, size_terms_by_frequency = FALSE,
  right_margin = 20, max_terms_to_display = 1e+05,
  use_subsumed_ngrams = FALSE, limits = NULL,
  clean_publication_plots = FALSE, rank_by_log_odds = FALSE)

Arguments

feature_selection_object

A list object generated by the feature_selection function.

title

A user supplied title for the plot. Defaults to "", in which case a blank title is displayed.

positive_category

The name the user wishes to give to the first category specified when using the feature_selection function. Defaults to "Category 1".

negative_category

The name the user wishes to give to the second category specified when using the feature_selection function. Defaults to "Category 2".

xlab

Defaults to 'Term Frequency', but can be modified as necessary.

display_top_words

Defaults to 20 and controls the number of top terms for each category displayed in the plot.

display_terms_next_to_points

Optional argument, defaults to FALSE. If TRUE, then terms are displayed next to the points corresponding to them on the plot. Can get messy.

size_terms_by_frequency

Optional argument, defualts to FALSE. If TRUE, then when top terms are printed, they are sized in proportion to their frequency.

right_margin

Parameter controling how much space should be reserved for the right margin in the plot (for displaying top terms). Defaults to 20 but can be adjusted depending on the length of terms.

max_terms_to_display

Defaults to 100,000. Used to prevent overloading the plotting device with very large vocabularies. Can be set by the user.

use_subsumed_ngrams

Logical indicating whether subsumed ngrams should be used when displaying top terms. This will only work if the user has selected subsume_ngrams = TRUE in the feature_selection() function (and is using a vocabulary contianing overlapping n-grams).

limits

An optional numeric vector of length two where the first number is the upper x limit (term count) and the second term is the absolute value of the maximum z-score to display (the y limit). Defaults to NULL, in which case the optimal values are automatically determined. Can be useful for comparison between plots.

clean_publication_plots

Logical to remove labels inside of plot and color all dots uniformly. Defaults to FALSE.

rank_by_log_odds

Only applicable for the "informed_Dirichlet" method. Defaults to FALSE. If TRUE, then terms are ranked by log odds instead of z-score.

Value

A Fightin' Words plot