This function searches the given fsets()
object d
for all
fuzzy association rules that satisfy defined constraints. It returns a list
of fuzzy association rules together with some statistics characterizing them
(such as support, confidence etc.).
searchrules(
d,
lhs = 2:ncol(d),
rhs = 1,
tnorm = c("goedel", "goguen", "lukasiewicz"),
n = 100,
best = c("confidence"),
minSupport = 0.02,
minConfidence = 0.75,
maxConfidence = 1,
maxLength = 4,
numThreads = 1,
trie = (maxConfidence < 1)
)
A list of the following elements: rules
and statistics
.
rules
is a list of mined fuzzy association rules. Each element of
that list is a character vector with consequent attribute being on the first
position.
statistics
is a data frame of statistical characteristics about mined
rules. Each row corresponds to a rule in the rules
list. Let us
consider a rule "a & b => c", let \(\otimes\) be a t-norm specified with
the tnorm
parameter and \(i\) goes over all rows of a data table
d
. Then columns of the statistics
data frame are as follows:
support: a rule's support degree: \(1/nrow(d) * \sum_{\forall i} a(i) \otimes b(i) \otimes c(i)\)
lhsSupport: a support of rule's antecedent (LHS): \(1/nrow(d) * \sum_{\forall i} a(i) \otimes b(i)\)
rhsSupport: a support of rule's consequent (RHS): \(1/nrow(d) * \sum_{\forall i} c(i)\)
confidence: a rule's confidence degree: \(support / lhsSupport\)
An object of class fsets()
- it is basically a matrix
where columns represent the fuzzy sets and values are the membership
degrees. For creation of such object, use fcut()
or lcut()
function.
Indices of fuzzy attributes that may appear on the left-hand-side (LHS) of association rules, i.e. in the antecedent.
Indices of fuzzy attributes that may appear on the right-hand-side (RHS) of association rules, i.e. in the consequent.
A t-norm to be used for computation of conjunction of fuzzy attributes. (Allowed are even only starting letters of "lukasiewicz", "goedel" and "goguen").
The non-negative number of rules to be found. If zero, the function
returns all rules satisfying the given conditions. If positive, only
n
best rules are returned. The criterium of what is ``best'' is
specified with the best
argument.
Specifies measure accordingly to which the rules are ordered
from best to worst. This argument is used mainly in combination with the
n
argument. Currently, only single value ("confidence") can be used.
The minimum support degree of a rule. Rules with support below that number are filtered out. It must be a numeric value from interval \([0, 1]\). See below for details on how the support degree is computed.
The minimum confidence degree of a rule. Rules with confidence below that number are filtered out. It must be a numeric value from interval \([0, 1]\). See below for details on how the confidence degree is computed.
Maximum confidence threshold. After finding a rule that
has confidence degree above the maxConfidence
threshold, no other
rule is resulted based on adding some additional attribute to its antecedent
part. I.e. if "Sm.age & Me.age => Sm.height" has confidence above
maxConfidence
threshold, no another rule containing "Sm.age & Me.age"
will be produced regardless of its interest measures.
If you want to disable this feature, set maxConfidence
to 1.
Maximum allowed length of the rule, i.e. maximum number of predicates that are allowed on the left-hand + right-hand side of the rule. If negative, the maximum length of rules is unlimited.
Number of threads used to perform the algorithm in
parallel. If greater than 1, the OpenMP library (not to be confused with
Open MPI) is used for parallelization. Please note that there are known
problems of using OpenMP together with another means of parallelization that
may be used within R. Therefore, if you plan to use the searchrules
function with some of the external parallelization mechanisms such as
library doMC
, make sure that numThreads
equals 1. This
feature is available only on systems that have installed the OpenMP library.
Whether or not to use internal mechanism of Tries. If FALSE,
then in the output may appear such rule that is a descendant of a rule that
has confidence above maxConfidence
threshold.
Tries consume very much memory, so if you encounter problems with
insufficient memory, set this argument to FALSE. On the other hand, the size
of result (if n
is set to 0) can be very high if trie is set to
FALSE.
Michal Burda
The function searches data frame d
for fuzzy association rules that
satisfy conditions specified by the parameters.
fcut()
, lcut()
, farules()
, fsets()
, pbld()
d <- lcut(CO2)
searchrules(d, lhs=1:ncol(d), rhs=1:ncol(d))
Run the code above in your browser using DataLab