Data quality indicator checks "Unexpected location" and "Unexpected proportion" with histograms and, if a grouping variable is included, plots of empirical cumulative distributions for the subgroups.
Indicator
acc_distributions_prop(
resp_vars = NULL,
study_data,
meta_data,
label_col,
check_param = "proportion",
plot_ranges = TRUE,
flip_mode = "noflip"
)
A list with:
SummaryTable
: data.frame containing data quality checks for
"Unexpected location" (FLG_acc_ud_loc
) and "Unexpected
proportion" (FLG_acc_ud_prop
) for each response
variable in resp_vars
.
SummaryData
: a data.frame containing data quality checks for
"Unexpected location" and / or "Unexpected proportion"
for a report
SummaryPlotList
: list of ggplots for each response variable in
resp_vars
.
variable list the names of the measurement variables
data.frame the data frame that contains the measurements
data.frame the data frame that contains metadata attributes of study data
variable attribute the name of the column in the metadata with labels of variables
enum any | location | proportion. Which type of check should be conducted (if possible): a check on the location of the mean or median value of the study data, a check on proportions of categories, or either of them if the necessary metadata is available.
logical Should the plot show ranges and results from the data quality checks? (default: TRUE)
enum default | flip | noflip | auto. Should the plot be
in default orientation, flipped, not flipped or
auto-flipped. Not all options are always supported.
In general, this con be controlled by
setting the roptions(dataquieR.flip_mode = ...)
. If
called from dq_report
, you can also pass
flip_mode
to all function calls or set them
specifically using specific_args
.
If no response variable is defined, select all variables of type float or integer in the study data.
Remove missing codes from the study data (if defined in the metadata).
Remove measurements deviating from (hard) limits defined in the metadata (if defined).
Exclude variables containing only NA
or only one unique value (excluding
NA
s).
Perform check for "Unexpected location" if defined in the metadata (needs a LOCATION_METRIC (mean or median) and LOCATION_RANGE (range of expected values for the mean and median, respectively)).
Perform check for "Unexpected proportion" if defined in the metadata (needs PROPORTION_RANGE (range of expected values for the proportions of the categories)).
Plot histogram(s).
If group_vars is specified by the user, distributions within group-wise ecdf are presented.
acc_distributions