do_preprocess.pk: Do pre-processing

Description

Pre-process data for a `pk` object

Usage

# S3 method for pk
do_preprocess(obj, ...)

Value

The same `pk` object, with added elements `data` (containing the cleaned, gap-filled data) and `data_info` (containing summary information about the data, e.g. number of observations by route, media, detect/nondetect; empirical tmax, time of peak concentration for oral data; number of observations before and after empirical tmax)

Arguments

obj: A `pk` object
...: Additional arguments. Not in use currently.

Author

John Wambaugh, Caroline Ring, Christopher Cook, Gilberto Padilla Mercado

Details

Data pre-processing for an object `obj` includes the following steps, in order:

- Coerce data to class `data.frame` (if it is not already) - Rename variables to harmonized "`invivopkfit` aesthetic" variable names, using `obj$mapping` - Check that the data includes only routes in `obj$settings_preprocess$routes_keep` and media in `obj$settings_preprocess$media_keep` - Check that the data includes only one unit for concentration, one unit for time, and one unit for dose. - Coerce `Value`, `Value_SD`, `LOQ`, `Dose`, and `Time` to numeric, if they are not already. - Coerce `Species`, `Route`, and `Media` to lowercase. - Replace any negative `Value`, `Value_SD`, `Dose`, or `Time` with `NA` - If any non-NA `Value` is currently less than its non-NA LOQ, then replace it with NA - Impute any NA `LOQ`: as `calc_loq_factor` * minimum non-NA `Value` in each `loq_group` - For any cases where `N_Subject`s is NA, impute `N_Subjects` = 1 - For anything with `N_Subjects` == 1, set `Value_SD` to 0 - Impute missing `Value_SD` as follows: For observations with `N_Subjects` > 1, take the minimum non-missing `Value_SD` for each `sd_group`. If all SDs are missing in an `sd_group`, then `Value_SD` for each observation in that group will be imputed as 0. - Mark data for exclusion according to the following criteria: - Exclude any remaining observations where both Value and LOQ are NA - For any cases where `N_Subjects` is NA, impute `N_Subjects` = 1 - Exclude any remaining observations with `N_Subjects` > 1 and `Value_SD` still NA. (This should never occur, if SD imputation is performed, but just in case.) - Exclude any observations with `N_Subjects` > 1 where reported `Value` is NA, because log-likelihood for non-detect multi-subject observations has not been implemented. - Exclude any observations with NA `Time` values - Exclude any observations with `Dose` = 0 - Apply any time transformations specified by user - Scale concentration by `ratio_conc_dose` - Apply any concentration transformations specified by the user. - If `Series_ID` is not included, then assign it as NA - Create variable `pLOQ` and set it equal to `LOQ`