com_item_missingness: Summarize missingness columnwise (in variable)

Description

Item-Missingness (also referred to as item nonresponse (De Leeuw et al. 2003)) describes the missingness of single values, e.g. blanks or empty data cells in a data set. Item-Missingness occurs for example in case a respondent does not provide information for a certain question, a question is overlooked by accident, a programming failure occurs or a provided answer were missed while entering the data.

Indicator

Usage

com_item_missingness(
  study_data,
  meta_data,
  resp_vars = NULL,
  label_col,
  show_causes = TRUE,
  cause_label_df,
  include_sysmiss = TRUE,
  threshold_value,
  suppressWarnings = FALSE,
  assume_consistent_codes = TRUE,
  expand_codes = assume_consistent_codes,
  drop_levels = TRUE,
  expected_observations = c("HIERARCHY", "ALL", "SEGMENT"),
  pretty_print = lifecycle::deprecated()
)

Value

a list with:

SummaryTable: data frame about item missingness per response variable
SummaryData: data frame about item missingness per response variable formatted for user
SummaryPlot: ggplot2 heatmap plot, if show_causes was TRUE
ReportSummaryTable: data frame underlying SummaryPlot

Arguments

study_data: data.frame the data frame that contains the measurements
meta_data: data.frame the data frame that contains metadata attributes of study data
resp_vars: variable list the name of the measurement variables
label_col: variable attribute the name of the column in the metadata with labels of variables
show_causes: logical if TRUE, then the distribution of missing codes is shown
cause_label_df: data.frame missing code table. If missing codes have labels the respective data frame can be specified here or in the metadata as assignments, see cause_label_df
include_sysmiss: logical Optional, if TRUE system missingness (NAs) is evaluated in the summary plot
threshold_value: numeric from=0 to=100. a numerical value ranging from 0-100
suppressWarnings: logical warn about consistency issues with missing and jump lists
assume_consistent_codes: logical if TRUE and no labels are given and the same missing/jump code is used for more than one variable, the labels assigned for this code are treated as being be the same for all variables.
expand_codes: logical if TRUE, code labels are copied from other variables, if the code is the same and the label is set somewhere
drop_levels: logical if TRUE, do not display unused missing codes in the figure legend.
expected_observations: enum HIERARCHY | ALL | SEGMENT. If ALL, all observations are expected to comprise all study segments. If SEGMENT, the PART_VAR is expected to point to a variable with values of 0 and 1, indicating whether the variable was expected to be observed for each data row. If HIERARCHY, this is also checked recursively, so, if a variable points to such a participation variable, and that other variable does has also a PART_VAR entry pointing to a variable, the observation of the initial variable is only expected, if both segment variables are 1.
pretty_print: logical deprecated. If you want to have a human readable output, use SummaryData instead of SummaryTable

ALGORITHM OF THIS IMPLEMENTATION:

Lists of missing codes and, if applicable, jump codes are selected from the metadata
The no. of system missings (NA) in each variable is calculated
The no. of used missing codes is calculated for each variable
The no. of used jump codes is calculated for each variable
Two result dataframes (1: on the level of observations, 2: a summary for each variable) are generated
OPTIONAL: if show_causes is selected, one summary plot for all resp_vars is provided

Description

Usage

Value

Arguments

ALGORITHM OF THIS IMPLEMENTATION:

See Also