Learn R Programming

ldamatch (version 1.0.3)

.choose_best_subjects: Chooses best set of subjects in a set.

Description

Chooses best set of subjects in a set.

Usage

.choose_best_subjects(
  candidates,
  is.in,
  condition,
  covariates,
  halting_test,
  thresh,
  tiebreaker,
  props,
  prefer_test,
  max_removed_per_cond,
  max_removed_in_next_step,
  ratio_for_slowdown,
  remove_best_only
)

Value

list(inds): A list containing the best index vectors indicating the positions to flip in is.in.

Arguments

candidates

An iterator returning (or a list containing) indices for the is.in logical vector whose in / out status is to be changed.

is.in

A logical vector showing which items are preserved currently; versions resulting by changing indices for each candidate are then compared.

condition

A factor vector containing condition labels.

covariates

A columnwise matrix containing covariates to match the conditions on.

halting_test

A function to apply to `covariates` (in matrix form) which is TRUE iff the conditions are matched. Signature: halting_test(condition, covariates, thresh). The following halting tests are part of this package: t_halt, U_halt, l_halt, ad_halt, ks_halt, wilks_halt, f_halt. You can create the intersection of two or more halting tests using create_halting_test.

thresh

The return value of halting_test has to be greater than or equal to thresh for the matched groups.

tiebreaker

NULL, or a function similar to halting_test, used to decide between cases for which halting_test yields equal values.

props

Either the desired proportions (percentage) of the sample for each condition as a named vector, or the names of the conditions for which we prefer to preserve the subjects, in decreasing order of preference. If not specified, the (full) sample proportions are used. This is preferred among configurations with the same taken into account by the other methods to some extent. For example, c(A = 0.4, B = 0.4, C = 0.2) means that we would like the number of subjects in groups A, B, and C to be around 40%, 40%, and 20% of the total number of subjects, respectively. Whereas c("A", "B", "C") means that if possible, we would like to keep all subjects in group A, and prefer keeping subjects in B, even if it results in losing more subjects from C.

prefer_test

If TRUE, prefers higher test statistic more than the expected group size proportion; default is TRUE. Used by all algorithms except exhaustive, which always

max_removed_per_cond

A named integer vector, containing the maximum number of subjects that can be removed from each group. Specify 0 for groups if you want to preserve all of their subjects. If you do not specify a value for a group, it defaults to 2 less than the group size. Values outside the valid range of 0..(N-1) (where N is the number of subjects in the group) are corrected without a warning.

ratio_for_slowdown

The p-value / threshold ratio at which it starts removing subjects one by one. Used when max_removed_per_step > 1, with a default value of 0.5.