support: Support Counting for Itemsets

Description

Provides the generic function and the needed S4 method to count support for given itemsets (and other types of associations) in a given transaction database.

Usage

support(x, transactions, ...)
# S4 method for itemMatrix
support(x, transactions, 
    type= c("relative", "absolute"), weighted = FALSE, control = NULL)
# S4 method for associations
support(x, transactions, 
    type= c("relative", "absolute"), weighted = FALSE, control = NULL)

Arguments

the set of itemsets for which support should be counted.

…

further arguments are passed on.

transactions

the transaction data set used for mining.

type

a character string specifying if "relative" support or "absolute" support (counts) are returned for the itemsets in x. (default: "relative")

weighted

should support be weighted by transactions weights stored as column "weight" in transactionInfo?

control

a named list with elements method indicating the method ("tidlists" or "ptree"), and the logical arguments reduce and verbose to indicate if unused items are removed and if the output should be verbose.

Value

A numeric vector of the same length as x containing the support values for the sets in x.

Details

Normally, itemset support is counted during mining the database with a set minimum support. However, if only the support information for a single or a few itemsets is needed, one might not want to mine the database for all frequent itemsets.

If in control method = "ptree" is used, the counters for the itemsets are organized in a prefix tree. The transactions are sequentially processed and the corresponding counters in the prefix tree are incremented (see Hahsler et al, 2008). This method is used by default since it is typically significantly faster than tid list intersection.

If in control method = "tidlists" is used, support is counted using transaction ID list intersection which is used by several fast mining algorithms (e.g., by Eclat). However, Support is determined for each itemset individually which is slow for a large number of long itemsets in dense data.

If in control reduce = TRUE is used, unused items are removed from the data before creating rules. This might be slower for large transaction data sets.

References

Michael Hahsler, Christian Buchta, and Kurt Hornik. Selective association rule generation. Computational Statistics, 23(2):303-315, April 2008.

Examples

Run this code

# NOT RUN {
data("Income")

## find and some frequent itemsets
itemsets <- eclat(Income)[1:5]

## inspect the support returned by eclat
inspect(itemsets)

## count support in the database
support(items(itemsets), Income)
# }

Run the code above in your browser using DataLab