Provides the generic function support()
and the methods to count support for
given itemMatrix and associations in a given transactions
data.
support(x, transactions, ...)# S4 method for itemMatrix
support(
x,
transactions,
type = c("relative", "absolute"),
method = c("ptree", "tidlists"),
reduce = FALSE,
weighted = FALSE,
verbose = FALSE,
...
)
# S4 method for associations
support(
x,
transactions,
type = c("relative", "absolute"),
method = c("ptree", "tidlists"),
reduce = FALSE,
weighted = FALSE,
verbose = FALSE,
...
)
A numeric vector of the same length as x
containing the
support values for the sets in x
.
the set of itemsets for which support should be counted.
the transaction data set used for mining.
further arguments.
a character string specifying if "relative"
support or
"absolute"
support (counts) are returned for the itemsets in
x
. (default: "relative"
)
use "ptree"
or "tidlists"
. See Details Section.
should unused items are removed before counting?
should support be weighted by transactions weights stored as
column "weight"
in transactionInfo
?
report progress?
Michael Hahsler and Christian Buchta
Normally, the support of frequent itemsets is very efficiently counted during
mining process using a set minimum support.
However, if only the support for specific itemsets (maybe itemsets with very low support)
is needed, or the support of a set of itemsets needs to be recalculated on
different transactions than they were mined on, then support()
can be used.
Several methods for support counting are available:
"ptree"
(default method): The counters for the itemsets
are organized in a prefix tree. The transactions are sequentially processed
and the corresponding counters in the prefix tree are incremented (see
Hahsler et al, 2008). This method is used by default since it is typically
significantly faster than transaction ID list intersection.
"tidlists"
: support is counted using
transaction ID list intersection which is used by several fast mining
algorithms (e.g., by Eclat). However, Support is determined for each itemset
individually which is slow for a large number of long itemsets in dense
data.
To speed up counting, reduce = TRUE
can be specified in control. Unused items
are removed from the transactions before counting.
Michael Hahsler, Christian Buchta, and Kurt Hornik. Selective association rule generation. Computational Statistics, 23(2):303-315, April 2008.
Other interest measures:
confint()
,
coverage()
,
interestMeasure()
,
is.redundant()
,
is.significant()
data("Income")
## find and some frequent itemsets
itemsets <- eclat(Income)[1:5]
## inspect the support returned by eclat
inspect(itemsets)
## count support in the database
support(items(itemsets), Income)
Run the code above in your browser using DataLab