arules --- Mining Association Rules and Frequent Itemsets with R
The arules package for R provides the infrastructure for representing, manipulating and analyzing transaction data and patterns using frequent itemsets and association rules. Also provides a wide range of interest measures and mining algorithms including a interfaces and the code of Borgelt's efficient C implementations of the association mining algorithms Apriori and Eclat.
arules core packages:
- arules: arules base package with data structures, mining algorithms (APRIORI and ECLAT), interest measures.
- arulesViz: Visualization of association rules.
- arulesCBA: Classification algorithms based on association rules (includes CBA).
- arulesSequences: Mining frequent sequences (cSPADE).
Other related packages:
Additional mining algorithms
- arulesNBMiner: Mining NB-frequent itemsets and NB-precise rules.
- opusminer: OPUS Miner algorithm for filtered top-k association discovery.
- RKEEL: Interface to KEEL's association rule mining algorithm.
- RSarules: Mining algorithm which randomly samples association rules with one pre-chosen item as the consequent from a transaction dataset.
In-database analytics
- ibmdbR: IBM in-database analytics for R can calculate association rules from a database table.
- rfml: Mine frequent itemsets or association rules using a MarkLogic server.
Interface
- rattle: Provides a graphical user interface for association rule mining.
- pmml: Generates PMML (predictive model markup language) for association rules.
Classification
- arc: Alternative CBA implementation.
- inTrees: Interpret Tree Ensembles provides functions for: extracting, measuring and pruning rules; selecting a compact rule set; summarizing rules into a learner.
- rCBA: Alternative CBA implementation.
- qCBA: Quantitative Classification by Association Rules.
- sblr: Scalable Bayesian rule lists algorithm for classification.
Outlier Detection
- fpmoutliers: Frequent Pattern Mining Outliers.
Recommendation/Prediction
- recommenerlab: Supports creating predictions using association rules.
Installation
Stable CRAN version: install from within R with
install.packages("arules")Current development version: install from GitHub (needs devtools and [Rtools for Windows] (https://cran.r-project.org/bin/windows/Rtools/)).
devtools::install_github("mhahsler/arules")Usage
Load package and mine some association rules.
library("arules")
data("Adult")
rules <- apriori(Adult, parameter = list(supp = 0.5, conf = 0.9, target = "rules"))Parameter specification:
 confidence minval smax arem  aval originalSupport support minlen maxlen target   ext
        0.9    0.1    1 none FALSE            TRUE     0.5      1     10  rules FALSE
Algorithmic control:
 filter tree heap memopt load sort verbose
    0.1 TRUE TRUE  FALSE TRUE    2    TRUE
Absolute minimum support count: 24421 
apriori - find association rules with the apriori algorithm
version 4.21 (2004.05.09)        (c) 1996-2004   Christian Borgelt
set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[115 item(s), 48842 transaction(s)] done [0.03s].
sorting and recoding items ... [9 item(s)] done [0.00s].
creating transaction tree ... done [0.03s].
checking subsets of size 1 2 3 4 done [0.00s].
writing ... [52 rule(s)] done [0.00s].
creating S4 object  ... done [0.01s].Show basic statistics.
summary(rules)set of 52 rules
rule length distribution (lhs + rhs):sizes
 1  2  3  4 
 2 13 24 13 
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  1.000   2.000   3.000   2.923   3.250   4.000 
summary of quality measures:
    support         confidence          lift            count      
 Min.   :0.5084   Min.   :0.9031   Min.   :0.9844   Min.   :24832  
 1st Qu.:0.5415   1st Qu.:0.9155   1st Qu.:0.9937   1st Qu.:26447  
 Median :0.5974   Median :0.9229   Median :0.9997   Median :29178  
 Mean   :0.6436   Mean   :0.9308   Mean   :1.0036   Mean   :31433  
 3rd Qu.:0.7426   3rd Qu.:0.9494   3rd Qu.:1.0057   3rd Qu.:36269  
 Max.   :0.9533   Max.   :0.9583   Max.   :1.0586   Max.   :46560  
mining info:
  data ntransactions support confidence
 Adult         48842     0.5        0.9Inspect rules with the highest lift.
inspect(head(rules, n = 3, by = "lift"))    lhs                               rhs                            support confidence coverage lift count
[1] {sex=Male,                                                                                             
     native-country=United-States} => {race=White}                      0.54       0.91     0.60  1.1 26450
[2] {sex=Male,                                                                                             
     capital-loss=None,                                                                                    
     native-country=United-States} => {race=White}                      0.51       0.90     0.57  1.1 24976
[3] {race=White}                   => {native-country=United-States}    0.79       0.92     0.86  1.0 38493Using arule and tidyverse
arules works seemlessly with tidyverse. For example, dplyr can be used for cleaning and preparing the transactions and then functions in arules can be used with %>%.
library("tidyverse")
library("arules")
data("Adult")
rules <- Adult %>% apriori(parameter = list(supp = 0.5, conf = 0.9, target = "rules"))
rules %>% head(n = 3, by = "lift") %>% inspect    lhs                               rhs                            support confidence coverage lift count
[1] {sex=Male,                                                                                             
     native-country=United-States} => {race=White}                      0.54       0.91     0.60  1.1 26450
[2] {sex=Male,                                                                                             
     capital-loss=None,                                                                                    
     native-country=United-States} => {race=White}                      0.51       0.90     0.57  1.1 24976
[3] {race=White}                   => {native-country=United-States}    0.79       0.92     0.86  1.0 38493Usage arules from Python
See Getting started with R arules using Python.
Support
Please report bugs here on GitHub. Questions should be posted on stackoverflow and tagged with arules.
References
- Michael Hahsler, Sudheer Chelluboina, Kurt Hornik, and Christian Buchta. The arules R-package ecosystem: Analyzing interesting patterns from large transaction datasets. Journal of Machine Learning Research, 12:1977-1981, 2011.
- Michael Hahsler, Bettina Grün and Kurt Hornik. arules - A Computational Environment for Mining Association Rules and Frequent Item Sets. Journal of Statistical Software, 14(15), 2005.
- Hahsler, Michael. A Probabilistic Comparison of Commonly Used Interest Measures for Association Rules, 2015, URL: https://michael.hahsler.net/research/association_rules/measures.html.