rsq.pk: Calculate R-squared for observed vs. predicted values

Description

Calculate the square of the Pearson correlation coefficient (r) between observed and model-predicted values

Usage

# S3 method for pk
rsq(
  obj,
  newdata = NULL,
  model = NULL,
  method = NULL,
  exclude = TRUE,
  use_scale_conc = FALSE,
  rsq_group = NULL,
  sub_pLOQ = TRUE,
  suppress.messages = NULL,
  ...
)

Value

A dataframe with one row for each `data_group`, `model` and `method`. The final column contains the R-squared of the model fitted by the corresponding method, using the data in `newdata`.

Arguments

obj: A `pk` object
newdata: Optional: A `data.frame` with new data for which to make predictions and compute R-squared. If NULL (the default), then R-squared will be computed for the data in `obj$data`. `newdata` is required to contain at least the following variables: `Time`, `Time.Units`, `Dose`, `Route`, `Media`, `Conc`, `Conc_SD`, `N_Subjects`, `Detect`.
model: Optional: Specify one or more of the fitted models for which to make predictions and calculate R-squared. If NULL (the default), R-squared will be returned for all of the models in `obj$stat_model`.
method: Optional: Specify one or more of the [optimx::optimx()] methods for which to make predictions and calculate R-squared. If NULL (the default), RMSEs will be returned for all of the models in `obj$optimx_settings$method`.
exclude: Logical: `TRUE` to compute the R-squared excluding any observations in the data marked for exclusion (if there is a variable `exclude` in the data, an observation is marked for exclusion when `exclude `FALSE` to include all observations, regardless of exclusion status. Default `TRUE`.
use_scale_conc: Possible values: `TRUE` (default), `FALSE`, or a named list with elements `dose_norm` and `log10_trans` which themselves should be either `TRUE` or `FALSE`. If `use_scale_conc = TRUE` (the default for this function), then the concentration scaling/transformations in `obj` will be applied to both predicted and observed concentrations when the R-squared is computed (see [calc_rsq()] for details). If `use_scale_conc = FALSE`, then no concentration scaling or transformation will be applied when the R-squared is computed. If `use_scale_conc = list(dose_norm = ..., log10_trans = ...)`, then the specified dose normalization and/or log10-transformation will be applied.
rsq_group: Default: Chemical, Species. Determines what the data grouping that is used to calculate R-squared value. Should be set to lowest number of variables that still would return unique experimental conditions. Input in the form of `ggplot2::vars(Chemical, Species, Route, Media, Dose)`.
sub_pLOQ: TRUE (default): Substitute all predictions below the LOQ with the LOQ before computing R-squared. FALSE: do not.
suppress.messages: Logical: whether to suppress message printing. If NULL (default), uses the setting in `obj$settings_preprocess$suppress.messages`
...: Additional arguments. Not currently in use.

Author

Caroline Ring

Details

Calculate the square of the Pearson correlation coefficient (r) between observed and model-predicted values, when observed data may be left-censored (non-detect) or may be reported in summary form (as sample mean, sample standard deviation, and sample number of subjects). Additionally, handle the situation when observed data and predictions need to be log-transformed before RMSE is calculated.

$r^2$ is calculated according to the following formula, to properly handle multi-subject observations reported in summary format:

$$ r^2 = \left( \frac{ \sum_{i=1}^G \mu_i n_i \bar{y}_i - (\bar{\mu} + \bar{y}) \sum_{i=1}^G n_i \mu_i + (\bar{mu} \bar{y}) \sum_{i=1}^G n_i } { \sqrt{ \sum_{i=1}^G (n_i - 1) s_i^2 + \sum_{i=1}^G n_i \bar{y}_i^2 - 2 \bar{y} \sum_{i=1}^G n_i \bar{y}_i + N + \bar{y}^2 } \sqrt{ \sum_{i=1}^G n_i \mu_i^2 - 2 \bar{y} \sum_{i=1}^G n_i \mu_i + N + \bar{y}^2 } } \right)^2 $$

In this formula, there are $G$ groups (reported observations). (For CvTdb data, a "group" is a specific combination of chemical, species, route, medium, dose, and timepoint.) $n_i$ is the number of subjects for group $i$. $\bar{y}_i$ is the sample mean for group $i$. $s_i$ is the sample standard deviation for group $i$.$\mu_i$ is the model-predicted value for group $i$. $\bar{y}$ is the grand mean of observations:

$$ \bar{y} = \frac{ \sum_{i=1}^G n_i \bar{y}_i }{\sum_{i=1}^G n_i} $$

$\bar{\mu}$ is the grand mean of predictions:

$$ \bar{\mu} = \frac{ \sum_{i=1}^G n_i \mu_i }{\sum_{i=1}^G n_i} $$

$N$ is the grand total of subjects:

$$N = \sum_{i=1}^G n_i$$

For the non-summary case ($N$ single-subject observations, with all $n_i = 1$, $s_i = 0$, and $\bar{y}_i = y_i$), this formula reduces to the familiar formula

$$ r^2 = \left( \frac{\sum_{i=1}^N (y_i - \bar{y}) (\mu_i - \bar{\mu})} {\sqrt{ \sum_{i=1}^N (y_i - \bar{y})^2 } \sqrt{ \sum_{i=1}^N (\mu_i - \bar{\mu})^2 } } \right)^2 $$

# Left-censored data

If the observed value is censored, and the predicted value is less than the reported LOQ, then the observed value is (temporarily) set equal to the predicted value, for an effective error of zero.

If the observed value is censored, and the predicted value is greater than the reported LOQ, the the observed value is (temporarily) set equal to the reported LOQ, for an effective error of (LOQ - predicted).

# Log10 transformation

If `log10_trans log10(observations) vs. log10(predictions).

In the case where observed values are reported in summary format, each sample mean and sample SD (reported on the natural scale, i.e. the mean and SD of natural-scale individual observations) are used to produce an estimate of the log10-scale sample mean and sample SD (i.e., the mean and SD of log10-transformed individual observations), using [convert_summary_to_log10()].

The formulas are as follows. Again, $\bar{y}_i$ is the sample mean for group $i$. $s_i$ is the sample standard deviation for group $i$.

$$\textrm{log10-scale sample mean}_i = \log_{10} \left(\frac{\bar{y}_i^2}{\sqrt{\bar{y}_i^2 + s_i^2}} \right)$$