arules --- Mining Association Rules and Frequent Itemsets with R

The arules package for R provides the infrastructure for representing, manipulating and analyzing transaction data and patterns using frequent itemsets and association rules. Also provides a wide range of interest measures and mining algorithms including a interfaces and the code of Borgelt's efficient C implementations of the association mining algorithms Apriori and Eclat.

arules core packages:

arules: arules base package with data structures, mining algorithms (APRIORI and ECLAT), interest measures.
arulesViz: Visualization of association rules.
arulesCBA: Classification algorithms based on association rules (includes CBA).
arulesSequences: Mining frequent sequences (cSPADE).

Other related packages:

Additional mining algorithms

arulesNBMiner: Mining NB-frequent itemsets and NB-precise rules.
opusminer: OPUS Miner algorithm for filtered top-k association discovery.
RKEEL: Interface to KEEL's association rule mining algorithm.
RSarules: Mining algorithm which randomly samples association rules with one pre-chosen item as the consequent from a transaction dataset.

In-database analytics

ibmdbR: IBM in-database analytics for R can calculate association rules from a database table.
rfml: Mine frequent itemsets or association rules using a MarkLogic server.

Interface

rattle: Provides a graphical user interface for association rule mining.
pmml: Generates PMML (predictive model markup language) for association rules.

Classification

arc: Alternative CBA implementation.
inTrees: Interpret Tree Ensembles provides functions for: extracting, measuring and pruning rules; selecting a compact rule set; summarizing rules into a learner.
rCBA: Alternative CBA implementation.
qCBA: Quantitative Classification by Association Rules.
sblr: Scalable Bayesian rule lists algorithm for classification.

Outlier Detection

fpmoutliers: Frequent Pattern Mining Outliers.

Recommendation/Prediction

recommenerlab: Supports creating predictions using association rules.

Installation

Stable CRAN version: install from within R with

install.packages("arules")

Current development version: install from GitHub (needs devtools and [Rtools for Windows] (https://cran.r-project.org/bin/windows/Rtools/)).

devtools::install_github("mhahsler/arules")

Usage

Load package and mine some association rules.

library("arules")
data("Adult")

rules <- apriori(Adult, parameter = list(supp = 0.5, conf = 0.9, target = "rules"))

Parameter specification:
 confidence minval smax arem  aval originalSupport support minlen maxlen target   ext
        0.9    0.1    1 none FALSE            TRUE     0.5      1     10  rules FALSE

Algorithmic control:
 filter tree heap memopt load sort verbose
    0.1 TRUE TRUE  FALSE TRUE    2    TRUE

Absolute minimum support count: 24421 

apriori - find association rules with the apriori algorithm
version 4.21 (2004.05.09)        (c) 1996-2004   Christian Borgelt
set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[115 item(s), 48842 transaction(s)] done [0.03s].
sorting and recoding items ... [9 item(s)] done [0.00s].
creating transaction tree ... done [0.03s].
checking subsets of size 1 2 3 4 done [0.00s].
writing ... [52 rule(s)] done [0.00s].
creating S4 object  ... done [0.01s].

Show basic statistics.

summary(rules)

set of 52 rules

rule length distribution (lhs + rhs):sizes
 1  2  3  4 
 2 13 24 13 

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  1.000   2.000   3.000   2.923   3.250   4.000 

summary of quality measures:
    support         confidence          lift            count      
 Min.   :0.5084   Min.   :0.9031   Min.   :0.9844   Min.   :24832  
 1st Qu.:0.5415   1st Qu.:0.9155   1st Qu.:0.9937   1st Qu.:26447  
 Median :0.5974   Median :0.9229   Median :0.9997   Median :29178  
 Mean   :0.6436   Mean   :0.9308   Mean   :1.0036   Mean   :31433  
 3rd Qu.:0.7426   3rd Qu.:0.9494   3rd Qu.:1.0057   3rd Qu.:36269  
 Max.   :0.9533   Max.   :0.9583   Max.   :1.0586   Max.   :46560  

mining info:
  data ntransactions support confidence
 Adult         48842     0.5        0.9

Inspect rules with the highest lift.

inspect(head(rules, n = 3, by = "lift"))

    lhs                               rhs                            support confidence coverage lift count
[1] {sex=Male,                                                                                             
     native-country=United-States} => {race=White}                      0.54       0.91     0.60  1.1 26450
[2] {sex=Male,                                                                                             
     capital-loss=None,                                                                                    
     native-country=United-States} => {race=White}                      0.51       0.90     0.57  1.1 24976
[3] {race=White}                   => {native-country=United-States}    0.79       0.92     0.86  1.0 38493

Using arule and tidyverse

arules works seemlessly with tidyverse. For example, dplyr can be used for cleaning and preparing the transactions and then functions in arules can be used with %>%.

library("tidyverse")
library("arules")
data("Adult")

rules <- Adult %>% apriori(parameter = list(supp = 0.5, conf = 0.9, target = "rules"))
rules %>% head(n = 3, by = "lift") %>% inspect

    lhs                               rhs                            support confidence coverage lift count
[1] {sex=Male,                                                                                             
     native-country=United-States} => {race=White}                      0.54       0.91     0.60  1.1 26450
[2] {sex=Male,                                                                                             
     capital-loss=None,                                                                                    
     native-country=United-States} => {race=White}                      0.51       0.90     0.57  1.1 24976
[3] {race=White}                   => {native-country=United-States}    0.79       0.92     0.86  1.0 38493

Usage arules from Python

See Getting started with R arules using Python.

Support

Please report bugs here on GitHub. Questions should be posted on stackoverflow and tagged with arules.

References

Michael Hahsler, Sudheer Chelluboina, Kurt Hornik, and Christian Buchta. The arules R-package ecosystem: Analyzing interesting patterns from large transaction datasets. Journal of Machine Learning Research, 12:1977-1981, 2011.
Michael Hahsler, Bettina Grün and Kurt Hornik. arules - A Computational Environment for Mining Association Rules and Frequent Item Sets. Journal of Statistical Software, 14(15), 2005.
Hahsler, Michael. A Probabilistic Comparison of Commonly Used Interest Measures for Association Rules, 2015, URL: https://michael.hahsler.net/research/association_rules/measures.html.

arules --- Mining Association Rules and Frequent Itemsets with R

arules core packages:

Other related packages:

Additional mining algorithms

In-database analytics

Interface

Classification

Outlier Detection

Recommendation/Prediction

Installation

Usage

Using arule and tidyverse

Usage arules from Python

Support

References

Copy Link

Version

Install

Monthly Downloads

Version

License

Issues

Pull Requests

Stars

Forks

Repository

Maintainer

Last Published

Functions in arules (1.6-8)