Learn R Programming

arules (version 1.7-7)

confint: Confidence Intervals for Interest Measures for Association Rules

Description

Defines a method to compute confidence intervals for interest measures for association rules.

Usage

# S3 method for rules
confint(
  object,
  parm = "oddsRatio",
  level = 0.95,
  measure = NULL,
  side = c("two.sided", "lower", "upper"),
  method = NULL,
  replications = 1000,
  smoothCounts = 0,
  transactions = NULL,
  ...
)

Value

Returns a matrix with with one row for each rule and the two columns named "LL" and "UL" with the interval boundaries. The matrix has the following additional attributes:

measure

the interest measure.

level

the confidence level

side

the confidence level

smoothCounts

used count smoothing.

method

name of the method to create the interval

desc

description of the used method to calculate the confidence interval. The mentioned references can be found below.

Arguments

object

an object of class rules.

parm, measure

name of the interest measures (see interestMeasure()). measure can be used instead of parm.

level

the confidence level required.

side

Should a two-sided confidence interval or a one-sided limit be returned? Lower returns an interval with only a lower limit and upper returns an interval with only an upper limit.

method

method to construct the confidence interval. The available methods depends on the measure and the most common method is used by default.

replications

number of replications for method "simulation". Ignored for other methods.

smoothCounts

pseudo count for addaptive smoothing (Laplace smoothing). Often a pseudo counts of .5 is used for smoothing (see Detail Section).

transactions

if the rules object does not contain sufficient quality information, then a set of transactions to calculate the confidence interval for can be specified.

...

Additional parameters are ignored with a warning.

Author

Michael Hahsler

Details

This method creates a contingency table for each rule and then constructs a confidence interval for the specified measures.

Fast confidence interval approximations are currently available for the measures "support", "count", "confidence", "lift", "oddsRatio", and "phi". For all other measures, bootstrap sampling from a multinomial distribution is used.

Haldan-Anscombe correction (Haldan, 1940; Anscombe, 1956) to avoids issues with zero counts can be specified by smoothCounts = 0.5. Here .5 is added to each count in the contingency table.

References

Wilson, E. B. (1927). "Probable inference, the law of succession, and statistical inference". Journal of the American Statistical Association, 22 (158): 209-212. tools:::Rd_expr_doi("10.1080/01621459.1927.10502953")

Clopper, C.; Pearson, E. S. (1934). "The use of confidence or fiducial limits illustrated in the case of the binomial". Biometrika, 26 (4): 404-413. tools:::Rd_expr_doi("10.1093/biomet/26.4.404")

Doob, J. L. (1935). "The Limiting Distributions of Certain Statistics". Annals of Mathematical Statistics, 6: 160-169. tools:::Rd_expr_doi("10.1214/aoms/1177732594")

Fisher, R.A. (1962). "Confidence limits for a cross-product ratio". Australian Journal of Statistics, 4, 41.

Woolf, B. (1955). "On estimating the relation between blood group and diseases". Annals of Human Genetics, 19, 251-253.

Haldane, J.B.S. (1940). "The mean and variance of the moments of chi-squared when used as a test of homogeneity, when expectations are small". Biometrika, 29, 133-134.

Anscombe, F.J. (1956). "On estimating binomial response relations". Biometrika, 43, 461-464.

See Also

Other interest measures: coverage(), interestMeasure(), is.redundant(), is.significant(), support()

Examples

Run this code
data("Income")

# mine some rules with the consequent "language in home=english"
rules <- apriori(Income, parameter = list(support = 0.5),
  appearance = list(rhs = "language in home=english"))

# calculate the confidence interval for the rules' odds ratios.
# note that we use Haldane-Anscombe correction (with smoothCounts = .5)
# to avoid issues with 0 counts in the contingency table.
ci <- confint(rules, "oddsRatio",  smoothCounts = .5)
ci

# We add the odds ratio (with Haldane-Anscombe correction)
# and the confidence intervals to the quality slot of the rules.
quality(rules) <- cbind(
  quality(rules),
  oddsRatio = interestMeasure(rules, "oddsRatio", smoothCounts = .5),
  oddsRatio = ci)

rules <- sort(rules, by = "oddsRatio")
inspect(rules)

# use confidence intervals for lift to find rules with a lift significantly larger then 1.
# We set the confidence level to 95%, create a one-sided interval and check
# if the interval does not cover 1 (i.e., the lower limit is larger than 1).
ci <- confint(rules, "lift", level = 0.95, side = "lower")
ci

inspect(rules[ci[, "LL"] > 1])

Run the code above in your browser using DataLab