support: Support Counting for Itemsets

Description

Provides the generic function support() and the methods to count support for given itemMatrix and associations in a given transactions data.

Usage

support(x, transactions, ...)
# S4 method for itemMatrix
support(
  x,
  transactions,
  type = c("relative", "absolute"),
  method = c("ptree", "tidlists"),
  reduce = FALSE,
  weighted = FALSE,
  verbose = FALSE,
  ...
)
# S4 method for associations
support(
  x,
  transactions,
  type = c("relative", "absolute"),
  method = c("ptree", "tidlists"),
  reduce = FALSE,
  weighted = FALSE,
  verbose = FALSE,
  ...
)

Value

A numeric vector of the same length as x containing the support values for the sets in x.

Arguments

x: the set of itemsets for which support should be counted.
transactions: the transaction data set used for mining.
...: further arguments.
type: a character string specifying if "relative" support or "absolute" support (counts) are returned for the itemsets in x. (default: "relative")
method: use "ptree" or "tidlists". See Details Section.
reduce: should unused items are removed before counting?
weighted: should support be weighted by transactions weights stored as column "weight" in transactionInfo?
verbose: report progress?

Author

Michael Hahsler and Christian Buchta

Details

Normally, the support of frequent itemsets is very efficiently counted during mining process using a set minimum support. However, if only the support for specific itemsets (maybe itemsets with very low support) is needed, or the support of a set of itemsets needs to be recalculated on different transactions than they were mined on, then support() can be used.

Several methods for support counting are available:

"ptree" (default method): The counters for the itemsets are organized in a prefix tree. The transactions are sequentially processed and the corresponding counters in the prefix tree are incremented (see Hahsler et al, 2008). This method is used by default since it is typically significantly faster than transaction ID list intersection.
"tidlists": support is counted using transaction ID list intersection which is used by several fast mining algorithms (e.g., by Eclat). However, Support is determined for each itemset individually which is slow for a large number of long itemsets in dense data.

To speed up counting, reduce = TRUE can be specified in control. Unused items are removed from the transactions before counting.

References

Michael Hahsler, Christian Buchta, and Kurt Hornik. Selective association rule generation. Computational Statistics, 23(2):303-315, April 2008.

Examples

Run this code

data("Income")

## find and some frequent itemsets
itemsets <- eclat(Income)[1:5]

## inspect the support returned by eclat
inspect(itemsets)

## count support in the database
support(items(itemsets), Income)

Run the code above in your browser using DataLab