R package arules - Mining Association Rules and Frequent Itemsets
The arules package for R provides the infrastructure for representing, manipulating and analyzing transaction data and patterns using frequent itemsets and association rules. The package also provides a wide range of interest measures and mining algorithms including the code of Christian Borgelt’s popular and efficient C implementations of the association mining algorithms Apriori and Eclat. In addition, the following mining algorithms are available via fim4r:
- Apriori
- Eclat
- Carpenter
- FPgrowth
- IsTa
- RElim
- SaM
Code examples can be found in Chapter 5 of the web book R Companion for Introduction to Data Mining.
arules core packages:
- arules: arules base package with data structures, mining algorithms (APRIORI and ECLAT), interest measures.
- arulesViz: Visualization of association rules.
- arulesCBA: Classification algorithms based on association rules (includes CBA).
- arulesSequences: Mining frequent sequences (cSPADE).
Other related packages:
Additional mining algorithms
- arulesNBMiner: Mining NB-frequent itemsets and NB-precise rules.
- fim4r: Provides fast implementations
for several mining algorithms. An interface function called
fim4r()
is provided inarules
. - opusminer: OPUS Miner
algorithm for finding the op k productive, non-redundant itemsets.
Call
opus()
withformat = 'itemsets'
. - RKEEL: Interface to KEEL’s association rule mining algorithm.
- RSarules: Mining algorithm which randomly samples association rules with one pre-chosen item as the consequent from a transaction dataset.
In-database analytics
- ibmdbR: IBM in-database analytics for R can calculate association rules from a database table.
- rfml: Mine frequent itemsets or association rules using a MarkLogic server.
Interface
- rattle: Provides a graphical user interface for association rule mining.
- pmml: Generates PMML (predictive model markup language) for association rules.
Classification
- arc: Alternative CBA implementation.
- inTrees: Interpret Tree Ensembles provides functions for: extracting, measuring and pruning rules; selecting a compact rule set; summarizing rules into a learner.
- rCBA: Alternative CBA implementation.
- qCBA: Quantitative Classification by Association Rules.
- sblr: Scalable Bayesian rule lists algorithm for classification.
Outlier Detection
- fpmoutliers: Frequent Pattern Mining Outliers.
Recommendation/Prediction
- recommenerlab: Supports creating predictions using association rules.
Installation
Stable CRAN version: Install from within R with
install.packages("arules")
Current development version: Install from r-universe.
install.packages("arules", repos = "https://mhahsler.r-universe.dev")
Usage
Load package and mine some association rules.
library("arules")
data("IncomeESL")
trans <- transactions(IncomeESL)
trans
## transactions in sparse format with
## 8993 transactions (rows) and
## 84 items (columns)
rules <- apriori(trans, supp = 0.1, conf = 0.9, target = "rules")
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.9 0.1 1 none FALSE TRUE 5 0.1 1
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 899
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[84 item(s), 8993 transaction(s)] done [0.01s].
## sorting and recoding items ... [42 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 6 done [0.03s].
## writing ... [457 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
Inspect the rules with the highest lift.
inspect(head(rules, n = 3, by = "lift"))
## lhs rhs support confidence coverage lift count
## [1] {dual incomes=no,
## householder status=own} => {marital status=married} 0.10 0.97 0.10 2.6 914
## [2] {years in bay area=>10,
## dual incomes=yes,
## type of home=house} => {marital status=married} 0.10 0.96 0.10 2.6 902
## [3] {dual incomes=yes,
## householder status=own,
## type of home=house,
## language in home=english} => {marital status=married} 0.11 0.96 0.11 2.6 988
Using arules with tidyverse
arules
works seamlessly with tidyverse.
For example:
dplyr
can be used for cleaning and preparing the transactions.transaction()
and other functions accepttibble
as input.- Functions in arules can be used with
%>%
. - arulesViz provides
visualizations based on
ggplot2
.
For example, we can remove the ethnic information column before creating transactions and then mine and inspect rules.
library("tidyverse")
library("arules")
data("IncomeESL")
trans <- IncomeESL %>%
select(-`ethnic classification`) %>%
transactions()
rules <- trans %>%
apriori(supp = 0.1, conf = 0.9, target = "rules", control = list(verbose = FALSE))
rules %>%
head(n = 3, by = "lift") %>%
inspect()
## lhs rhs support confidence coverage lift count
## [1] {dual incomes=no,
## householder status=own} => {marital status=married} 0.10 0.97 0.10 2.6 914
## [2] {years in bay area=>10,
## dual incomes=yes,
## type of home=house} => {marital status=married} 0.10 0.96 0.10 2.6 902
## [3] {dual incomes=yes,
## householder status=own,
## type of home=house,
## language in home=english} => {marital status=married} 0.11 0.96 0.11 2.6 988
Using arules from Python
See Getting started with arules using Python.
Support
Please report bugs here on GitHub. Questions should be posted on stackoverflow and tagged with arules.
References
- Michael Hahsler, Sudheer Chelluboina, Kurt Hornik, and Christian Buchta. The arules R-package ecosystem: Analyzing interesting patterns from large transaction datasets. Journal of Machine Learning Research, 12:1977-1981, 2011.
- Michael Hahsler, Bettina Grün and Kurt Hornik. arules - A Computational Environment for Mining Association Rules and Frequent Item Sets. Journal of Statistical Software, 14(15), 2005.
- Hahsler, Michael. A Probabilistic Comparison of Commonly Used Interest Measures for Association Rules, 2015, URL: https://mhahsler.github.io/arules/docs/measures.
- Michael Hahsler. An R Companion for Introduction to Data Mining: Chapter 5, 2021, URL: https://mhahsler.github.io/Introduction_to_Data_Mining_R_Examples/book/