Learn R Programming

nuggets (version 1.2.0)

dig_correlations: Search for conditional correlations

Description

Compute correlation between all combinations of xvars and yvars columns of x in subdata corresponding to conditions generated from condition columns.

Usage

dig_correlations(
  x,
  condition = where(is.logical),
  xvars = where(is.numeric),
  yvars = where(is.numeric),
  method = "pearson",
  alternative = "two.sided",
  exact = NULL,
  min_length = 0L,
  max_length = Inf,
  min_support = 0,
  threads = 1,
  ...
)

Value

A tibble with found rules.

Arguments

x

a matrix or data frame with data to search in.

condition

a tidyselect expression (see tidyselect syntax) specifying the columns to use as condition predicates

xvars

a tidyselect expression (see tidyselect syntax) specifying the columns to use for computation of correlations

yvars

a tidyselect expression (see tidyselect syntax) specifying the columns to use for computation of correlations

method

a character string indicating which correlation coefficient is to be used for the test. One of "pearson", "kendall", or "spearman"

alternative

indicates the alternative hypothesis and must be one of "two.sided", "greater" or "less". "greater" corresponds to positive association, "less" to negative association.

exact

a logical indicating whether an exact p-value should be computed. Used for Kendall's tau and Spearman's rho. See stats::cor.test() for more information.

min_length

the minimum size (the minimum number of predicates) of the condition to be generated (must be greater or equal to 0). If 0, the empty condition is generated in the first place.

max_length

The maximum size (the maximum number of predicates) of the condition to be generated. If equal to Inf, the maximum length of conditions is limited only by the number of available predicates.

min_support

the minimum support of a condition to trigger the callback function for it. The support of the condition is the relative frequency of the condition in the dataset x. For logical data, it equals to the relative frequency of rows such that all condition predicates are TRUE on it. For numerical (double) input, the support is computed as the mean (over all rows) of multiplications of predicate values.

threads

the number of threads to use for parallel computation.

...

Further arguments, currently unused.

Author

Michal Burda

See Also