Learn R Programming

SpeedReader (version 0.9.1)

pmi: A function to calculate a number of information-theoretic measures on terms in a contingency table, including point-wise mutual information.

Description

A function to calculate a number of information-theoretic measures on terms in a contingency table, including point-wise mutual information.

Usage

pmi(contingency_table, display_top_x_terms = 20, term_threshold = 5,
  every_category_counts = FALSE)

Arguments

contingency_table

A contingency table generated by the `contingency_table()` function.

display_top_x_terms

Defaults to 20, the number of top ranked terms to display for each measure.

term_threshold

The threshold at which terms are eliminated from the contingency table for the purposes of calculating information-theoretic quantities. THis gets around issues with terms that only appear once having very high PMI.

every_category_counts

Defaults to FALSE, if TRUE, then terms are removed if they do not appear at least term_threshold times in every row (category) of the contingency table.

Value

A list object containing lots of different information theoretic measures calculated on the contingency table. If a sparse matrix was provided, then a sparse PMI table is returned. Note that the "zero" entries in this sparse matrix are actually -Inf, but cannot be represented as such using the slam sparse matrix libraries (which this package does), so you will manually need to replace the zero entries with -Inf if you want to compare to a dense matrix.