- x
a matrix or data frame with data to search in.
- f
the callback function to be executed for each generated condition.
The arguments of the callback function differ based on the value of the
type
argument (see below):
If type = "crisp"
(that is, boolean),
the callback function f
must accept a single argument pd
of type
data.frame
with single (if yvars == NULL
) or two (if yvars != NULL
)
columns, accessible as pd[[1]]
and pd[[2]]
. Data frame pd
is
a subset of the original
data frame x
with all rows that satisfy the generated condition.
Optionally, the callback function may accept an argument nd
that
is a subset of the original data frame x
with all rows that do not
satisfy the generated condition.
If type = "fuzzy"
, the callback function f
must accept an argument
d
of type data.frame
with single (if yvars == NULL
) or two (if
yvars != NULL
) columns, accessible as d[[1]]
and d[[2]]
, and
a numeric argument weights
with the same length as the number of rows
in d
. The weights
argument contains the truth degree
of the generated condition for each row of d
. The truth degree is
a number in the interval \([0, 1]\) that represents the degree of
satisfaction of the condition in the original data row.
In all cases, the function must return a list of scalar values, which
will be converted into a single row of result of final tibble.
- condition
a tidyselect expression (see
tidyselect syntax)
specifying the columns to use as condition predicates. The selected
columns must be logical or numeric. If numeric, fuzzy conditions are
considered.
- xvars
a tidyselect expression (see
tidyselect syntax)
specifying the columns of x
, whose names will be used as a domain for
combinations use at the first place (xvar)
- yvars
NULL
or a tidyselect expression (see
tidyselect syntax)
specifying the columns of x
, whose names will be used as a domain for
combinations use at the second place (yvar)
- disjoint
an atomic vector of size equal to the number of columns of x
that specifies the groups of predicates: if some elements of the disjoint
vector are equal, then the corresponding columns of x
will NOT be
present together in a single condition. If x
is prepared with
partition()
, using the var_names()
function on x
's column names
is a convenient way to create the disjoint
vector.
- allow
a character string specifying which columns are allowed to be
selected by xvars
and yvars
arguments. Possible values are:
- na_rm
a logical value indicating whether to remove rows with missing
values from sub-data before the callback function f
is called
- type
a character string specifying the type of conditions to be processed.
The "crisp"
type accepts only logical columns as condition predicates.
The "fuzzy"
type accepts both logical and numeric columns as condition
predicates where numeric data are in the interval \([0, 1]\). The
callback function f
differs based on the value of the type
argument
(see the description of f
above).
- min_length
the minimum size (the minimum number of predicates) of the
condition to be generated (must be greater or equal to 0). If 0, the empty
condition is generated in the first place.
- max_length
the maximum size (the maximum number of predicates) of the
condition to be generated. If equal to Inf, the maximum length of conditions
is limited only by the number of available predicates.
- min_support
the minimum support of a condition to trigger the callback
function for it. The support of the condition is the relative frequency
of the condition in the dataset x
. For logical data, it equals to the
relative frequency of rows such that all condition predicates are TRUE on it.
For numerical (double) input, the support is computed as the mean (over all
rows) of multiplications of predicate values.
- max_support
the maximum support of a condition to trigger the callback
function for it. See argument min_support
for details of what is the
support of a condition.
- max_results
the maximum number of generated conditions to execute the
callback function on. If the number of found conditions exceeds
max_results
, the function stops generating new conditions and returns
the results. To avoid long computations during the search, it is recommended
to set max_results
to a reasonable positive value. Setting max_results
to Inf
will generate all possible conditions.
- verbose
a logical scalar indicating whether to print progress messages.
- threads
the number of threads to use for parallel computation.
- error_context
a list of details to be used in error messages.
This argument is useful when dig_grid()
is called from another
function to provide error messages, which refer to arguments of the
calling function. The list must contain the following elements:
arg_x
- the name of the argument x
as a character string
arg_condition
- the name of the argument condition
as a character
string
arg_xvars
- the name of the argument xvars
as a character string
arg_yvars
- the name of the argument yvars
as a character string
call
- an environment in which to evaluate the error messages.