Learn R Programming

arules (version 1.7-2)

confint: Confidence Intervals for Association Interest Measures

Description

Computes confidence intervals for interest Measures used in association rule mining.

Usage

# S3 method for rules
confint(object, parm = "oddsRatio", level = 0.95, 
  measure = NULL, side = c("two.sided", "lower", "upper"), method = NULL, 
  replications = 1000, smoothCounts = 0, transactions = NULL, ...)

Arguments

object

an object of class rules.

parm, measure

name of the interest measures (i.e., parameter). measure can be used instead of parm.

level

the confidence level required.

side

Should a two-sided confidence interval or a one-sided limit be returned? Lower returns an interval with only a lower limit and upper returns an interval with only an upper limit.

method

method to construct the confidence interval. The available methods depends on the measure and the most common method is used by default.

smoothCounts

pseudo count for addaptive smoothing (Laplace smoothing). Often a pseudo counts of .5 is used for smoothing (see Detail Section).

replications

number of replications for method "simulation". Ignored for other methods.

transactions

if the rules object does not contain sufficient quality information, then a set of transactions to calculate the confidence interval for can be specified.

...

Additional parameters are ignored with a warning.

Value

Returns a matrix with with one row for each rule and the two columns "LL" and "UL" with the interval. The matrix has the additional attributes:

measure

the interest measure.

level

the confidence level

side

the confidence level

smoothCounts

used count smoothing.

method

name of the method to create the interval

desc

desciption of the used method to calculate the confidence interval. The mentioned references can be found below.

Details

This method creates a contingency table for each rule and then constructs a confidence interval for the specified measures.

Fast confidence interval approximations are currently available for the measures "support", "count", "confidence", "lift", "oddsRatio", and "phi". For all other measures, bootstrap sampling from a multinomial distribution is used.

Haldan-Anscombe correction (Haldan, 1940; Anscombe, 1956) to avoids issues with zero counts can be specified by smoothCounts = 0.5. Here .5 is added to each count in the contingency table.

References

Wilson, E. B. (1927). "Probable inference, the law of succession, and statistical inference". Journal of the American Statistical Association. 22 (158): 209-212. 10.1080/01621459.1927.10502953

Clopper, C.; Pearson, E. S. (1934). "The use of confidence or fiducial limits illustrated in the case of the binomial". Biometrika. 26 (4): 404-413. 10.1093/biomet/26.4.404

Doob, J. L. (1935). "The Limiting Distributions of Certain Statistics". Annals of Mathematical Statistics. 6: 160-169. 10.1214/aoms/1177732594

Fisher, R.A. (1962). "Confidence limits for a cross-product ratio". Australian Journal of Statistics, 4, 41.

Woolf, B. (1955). "On estimating the relation between blood group and diseases". Annals of Human Genetics, 19, 251-253.

Haldane, J.B.S. (1940). "The mean and variance of the moments of chi-squared when used as a test of homogeneity, when expectations are small". Biometrika, 29, 133-134.

Anscombe, F.J. (1956). "On estimating binomial response relations". Biometrika, 43, 461-464.

See Also

interestMeasure, is.redundant

Examples

Run this code
# NOT RUN {
data("Income")

# mine some rules with the consequent "language in home=english"
rules <- apriori(Income, parameter = list(support = 0.5), 
  appearance = list(rhs = "language in home=english"))
 
# calculate the confidence interval for the rules' odds ratios.
# note that we use Haldane-Anscombe correction (with smoothCounts = .5)
# to avoid issues with 0 counts in the contingency table.
ci <- confint(rules, "oddsRatio",  smoothCounts = .5)
ci

# We add the odds ratio (with Haldane-Anscombe correction) 
# and the confidence intervals to the quality slot of the rules.
quality(rules) <- cbind(
  quality(rules), 
  oddsRatio = interestMeasure(rules, "oddsRatio", smoothCounts = .5), 
  oddsRatio = ci)

rules <- sort(rules, by = "oddsRatio")
inspect(rules)

# use confidence intervals for lift to find rules with a lift significantly larger then 1. 
# We set the confidence level to 95%, create a one-sided interval and check
# if the interval does not cover 1 (i.e., the lower limit is larger than 1).
ci <- confint(rules, "lift", level = 0.95, side = "lower")
ci

inspect(rules[ci[, "LL"] > 1])
# }

Run the code above in your browser using DataLab