Iteratively calculate disproportionate impact via the percentage point gap (PPG), proportionality index, and 80% index methods for many success variables, disaggregation variables, and scenarios, using data.table and collapse.
di_iterate_dt(
dt,
success_vars,
group_vars,
cohort_vars = NULL,
scenario_repeat_by_vars = NULL,
exclude_scenario_df = NULL,
weight_var = NULL,
include_non_disagg_results = TRUE,
ppg_reference_groups = "overall",
min_moe = 0.03,
use_prop_in_moe = FALSE,
prop_sub_0 = 0.5,
prop_sub_1 = 0.5,
di_prop_index_cutoff = 0.8,
di_80_index_cutoff = 0.8,
di_80_index_reference_groups = "hpg",
check_valid_reference = TRUE,
parallel = FALSE,
parallel_n_cores = parallel::detectCores()/2
)
A summarized data set of class data.table, with variables as described in di_iterate.
A data frame of class data.table. If the object is not a data table, one could surround the object with as.data.table.
A character vector of success variable names to iterate across.
A character vector of group (disaggregation) variable names to iterate across.
(Optional) A character vector of the same length as success_vars
to indicate the cohort variable to be used for each variable specified in success_vars
. A vector of length 1 could be specified, in which case the same cohort variable is used for each success variable. If not specified, then a single cohort is assumed for all success variables (defaults to NULL
).
(Optional) A character vector of variables to repeat DI calculations for across all combination of these variables. For example, the following variables could be specified:
Ed Goal: Degree/Transfer, Shot-term Career, Non-credit
First time college student: Yes, No
Full-time status: Yes, No
Each combination of these variables (eg, full time, first time college students with an ed goal of degree/transfer as one combination) would constitute an iteration / sample for which to calculate disproportionate impact for outcomes listed in success_vars
and for the disaggregation variables listed in group_vars
. The overall rate of success for full time, first time college students with an ed goal of degree/transfer would just include these students and not others. Each variable specified is also collapsed to an '- All' group so that the combinations also reflect all students of a particular category. The total number of combinations for the previous example would be (+1 representing the all category): (3 + 1) x (2 + 1) x (2 + 1) = 36.
(Optional) A data frame with variables that match scenario_repeat_by_vars
for specifying the combinations to exclude from DI calculations. Following the example specified above, one could choose to exclude part-time non-credit students from consideration.
(Optional) A character variable specifying the weight variable if the input data set is summarized (i.e., the the success variables specified in success_vars
contain count of successes). Weight here corresponds to the denominator when calculating the success rate. Defaults to NULL
for an input data set where each row describes an individual.
A logical variable specifying whether or not the non-disaggregated results should be returned; defaults to TRUE
. When TRUE
, a new variable `- None`
is added to the data set with a single data value '- All'
, and this variable is added to group_vars
as a disaggregation/group variable. The user would want these results returned to review non-disaggregated results.
Either 'overall'
, 'hpg'
, 'all but current'
, or a character vector of the same length as group_vars
that indicates the reference group value for each group variable in group_vars
when determining disproportionate impact using the percentage point gap method.
The minimum margin of error to be used in the PPG calculation; see di_ppg.
(TRUE
or FALSE
) Whether the estimated proportions should be used in the margin of error calculation by the PPG; see di_ppg.
Default is 0.50; see di_ppg.
Default is 0.50; see di_ppg.
Threshold used for determining disproportionate impact using the proportionality index; see di_prop_index; defaults to 0.80.
Threshold used for determining disproportionate impact using the 80% index; see di_80_index; defaults to 0.80.
Either 'overall'
, 'hpg'
, 'all but current'
, or a character vector of the same length as group_vars
that indicates the reference group value for each group variable in group_vars
when determining disproportionate impact using the 80% index.
(TRUE
or FALSE
) Check whether ppg_reference_groups
and di_80_index_reference_groups
contain valid values; defaults to TRUE
.
If TRUE
, then perform calculations in parallel. Defaults to FALSE
. Parallel execution is based on the parallel
package included in base R, using parLapply on Windows and mclapply on POSIX-based systems (Linux/Mac).
The number of CPU cores to use if parallel=TRUE
. Defaults to half of the maximum number of CPU cores on the system.
Iteratively calculate disproportionate impact via the percentage point gap (PPG), proportionality index, and 80% index methods for all combinations of success_vars
, group_vars
, and cohort_vars
, for each combination of subgroups specified by scenario_repeat_by_vars
, using data.table and collapse.