Baseline contrast patterns identify conditions under which a specific feature is significantly different from a given value by performing a one-sample statistical test.
var != 0 | C
Variable var
is (in average) significantly different from 0 under the
condition C
.
(measure_error != 0 | measure_tool_A
If measuring with measure tool A, the average measure error is
significantly different from 0.
The baseline contrast is computed using a one-sample statistical test, which
is specified by the method
argument. The function computes the contrast
between all variables specified by the vars
argument. Baseline contrasts
are computed in sub-data corresponding to conditions generated from the
condition
columns. Function dig_baseline_contrasts()
supports crisp
conditions only, i.e., the condition columns in x
must be logical.
dig_baseline_contrasts(
x,
condition = where(is.logical),
vars = where(is.numeric),
disjoint = var_names(colnames(x)),
min_length = 0L,
max_length = Inf,
min_support = 0,
max_support = 1,
method = "t",
alternative = "two.sided",
h0 = 0,
conf_level = 0.95,
max_p_value = 0.05,
wilcox_exact = FALSE,
wilcox_correct = TRUE,
wilcox_tol_root = 1e-04,
wilcox_digits_rank = Inf,
max_results = Inf,
verbose = FALSE,
threads = 1
)
A tibble with found patterns in rows. The following columns are always present:
the condition of the pattern as a character string
in the form {p1 & p2 & ... & pn}
where p1
, p2
, ..., pn
are
x
's column names.
the support of the condition, i.e., the relative
frequency of the condition in the dataset x
.
the name of the contrast variable.
the estimated mean or median of variable var
.
the statistic of the selected test.
the p-value of the underlying test.
the number of rows in the sub-data corresponding to the condition.
the lower bound of the confidence interval of the estimate.
the upper bound of the confidence interval of the estimate.
a character string indicating the alternative
hypothesis. The value must be one of "two.sided"
, "greater"
, or
"less"
.
a character string indicating the method used for the test.
a character string with additional information about the test (mainly error messages on failure).
For the "t"
method, the following additional columns are also
present (see also t.test()
):
the degrees of freedom of the t test.
the standard error of the mean.
a matrix or data frame with data to search the patterns in.
a tidyselect expression (see tidyselect syntax) specifying the columns to use as condition predicates
a tidyselect expression (see tidyselect syntax) specifying the columns to use for computation of contrasts
an atomic vector of size equal to the number of columns of x
that specifies the groups of predicates: if some elements of the disjoint
vector are equal, then the corresponding columns of x
will NOT be
present together in a single condition. If x
is prepared with
partition()
, using the var_names()
function on x
's column names
is a convenient way to create the disjoint
vector.
the minimum size (the minimum number of predicates) of the condition to be generated (must be greater or equal to 0). If 0, the empty condition is generated in the first place.
The maximum size (the maximum number of predicates) of the condition to be generated. If equal to Inf, the maximum length of conditions is limited only by the number of available predicates.
the minimum support of a condition to trigger the callback
function for it. The support of the condition is the relative frequency
of the condition in the dataset x
. For logical data, it equals to the
relative frequency of rows such that all condition predicates are TRUE on it.
For numerical (double) input, the support is computed as the mean (over all
rows) of multiplications of predicate values.
the maximum support of a condition to trigger the callback
function for it. See argument min_support
for details of what is the
support of a condition.
a character string indicating which contrast to compute.
One of "t"
, for parametric, or "wilcox"
, for non-parametric test on
equality in position.
indicates the alternative hypothesis and must be one of
"two.sided"
, "greater"
or "less"
. "greater"
corresponds to
positive association, "less"
to negative association.
a numeric value specifying the null hypothesis for the test. For
the "t"
method, it is the value of the mean. For the "wilcox"
method,
it is the value of the median. The default value is 0.
a numeric value specifying the level of the confidence interval. The default value is 0.95.
the maximum p-value of a test for the pattern to be considered
significant. If the p-value of the test is greater than max_p_value
, the
pattern is not included in the result.
(used for the "wilcox"
method only) a logical value
indicating whether the exact p-value should be computed. If NULL
, the
exact p-value is computed for sample sizes less than 50. See wilcox.test()
and its exact
argument for more information. Contrary to the behavior
of wilcox.test()
, the default value is FALSE
.
(used for the "wilcox"
method only) a logical value
indicating whether the continuity correction should be applied in the
normal approximation for the p-value, if wilcox_exact
is FALSE
. See
wilcox.test()
and its correct
argument for more information.
(used for the "wilcox"
method only) a numeric value
specifying the tolerance for the root-finding algorithm used to compute
the exact p-value. See wilcox.test()
and its tol.root
argument for
more information.
(used for the "wilcox"
method only) a numeric value
specifying the number of digits to round the ranks to. See wilcox.test()
and its digits.rank
argument for more information.
the maximum number of generated conditions to execute the
callback function on. If the number of found conditions exceeds
max_results
, the function stops generating new conditions and returns
the results. To avoid long computations during the search, it is recommended
to set max_results
to a reasonable positive value. Setting max_results
to Inf
will generate all possible conditions.
a logical scalar indicating whether to print progress messages.
the number of threads to use for parallel computation.
Michal Burda
dig_paired_baseline_contrasts()
, dig_complement_contrasts()
,
dig()
, dig_grid()
,
stats::t.test()
, stats::wilcox.test()