Creates a matched group via backward selection.
match_groups(
condition,
covariates,
halting_test,
thresh = 0.2,
method = ldamatch::matching_methods,
props = prop.table(table(condition)),
replicates = get("RND_DEFAULT_REPLICATES", .ldamatch_globals),
min_preserved = length(levels(condition)),
print_info = get("PRINT_INFO", .ldamatch_globals),
max_removed_per_cond = NULL,
tiebreaker = NULL,
lookahead = 2,
all_results = FALSE,
prefer_test = TRUE,
max_removed_per_step = 1,
max_removed_percent_per_step = 0.5,
ratio_for_slowdown = 0.5
)
A logical vector that contains TRUE for the conditions that are in the matched groups; or if all_results = TRUE, a list of such vectors.
A factor vector containing condition labels.
A columnwise matrix containing covariates to match the conditions on.
A function to apply to `covariates` (in matrix form)
which is TRUE iff the conditions are matched.
Signature: halting_test(condition, covariates, thresh).
The following halting tests are part of this package:
t_halt
, U_halt
,
l_halt
, ad_halt
,
ks_halt
, wilks_halt
,
f_halt
.
You can create the intersection of two or more halting
tests using create_halting_test
.
The return value of halting_test has to be greater than or equal to thresh for the matched groups.
The choice of search method, one of "random",
You can get more information about each method on the
help page for "search_<method_name>"
(e.g. "search_exhaustive
").
Either the desired proportions (percentage) of the sample for each condition as a named vector, or the names of the conditions for which we prefer to preserve the subjects, in decreasing order of preference. If not specified, the (full) sample proportions are used. This is preferred among configurations with the same taken into account by the other methods to some extent. For example, c(A = 0.4, B = 0.4, C = 0.2) means that we would like the number of subjects in groups A, B, and C to be around 40%, 40%, and 20% of the total number of subjects, respectively. Whereas c("A", "B", "C") means that if possible, we would like to keep all subjects in group A, and prefer keeping subjects in B, even if it results in losing more subjects from C.
The maximum number of random replications to be performed. This is only used for the "random" method.
The minimum number of preserved subjects. It can be used to ensure that the search will not take forever to run, but instead fail when a solution is not found when preserving this number of subjects.
If TRUE, prints summary information on the input and the
results, as well as progress information for the
exhaustive search and random algorithms. Default: TRUE;
can be changed using
set_param("PRINT_INFO", FALSE)
.
A named integer vector, containing the maximum number of subjects that can be removed from each group. Specify 0 for groups if you want to preserve all of their subjects. If you do not specify a value for a group, it defaults to 2 less than the group size. Values outside the valid range of 0..(N-1) (where N is the number of subjects in the group) are corrected without a warning.
NULL, or a function similar to halting_test, used to decide between cases for which halting_test yields equal values.
The lookahead to use: a positive integer. It is used by the heuristic3 and heuristic4 algorithms, with a default of 2. The running time is O(N ^ lookahead), wheren N is the number of subjects.
If TRUE, returns all results found by method in a list. (A list is returned even if there is only one result.) If FALSE (the default), it returns the first result (a logical vector).
If TRUE, prefers higher test statistic more than the expected group size proportion; default is TRUE. Used by all algorithms except exhaustive, which always
The number of equivalent subjects that can be removed in each step. (The actual allowed number may be less depending on the p-value / theshold ratio.) This parameters is used by the heuristic3 and heuristic4 algorithms, with a default value of 1.
The percentage of remaining subjects that can be removed in each step. Used when max_removed_per_step > 1, with a default value of 0.5.
The p-value / threshold ratio at which it starts removing subjects one by one. Used when max_removed_per_step > 1, with a default value of 0.5.
The exhaustive, heuristic3, and heuristic4 search methods use the foreach
package to parallelize computation.
To take advantage of this, you must register a cluster.
For example, to use all but one of the CPU cores, run:
doParallel::registerDoParallel(cores = max(1, parallel::detectCores() - 1))
To use sequential processing without getting a warning, run:
foreach::registerDoSEQ()
calc_p_value
for calculating the test statistic for
a group setup.
calc_metrics
for calculating multiple metrics about
the goodness of the result.
compare_ldamatch_outputs
for comparing multiple
different results from this function.
search_heuristic2, search_heuristic3, search_heuristic4, search_random, search_exhaustive
for