The log-likelihood function (probability of data given model parameters).
log_likelihood(
par,
const_params = NULL,
data = NULL,
data_sigma_group = NULL,
modelfun = NULL,
dose_norm = FALSE,
log10_trans = FALSE,
negative = TRUE,
force_finite = FALSE,
max_multiplier = NULL,
includes_preds = FALSE,
suppress.messages = TRUE
)
A log-likelihood value for the data given the parameter values in params
A named list of parameters and their values that are being optimized.
Optional: A named list of parameters and their values that are being held constant.
A `data.frame` of data with harmonized variable names. Required: `Time_trans`, `Dose`, `Conc`, `Detect`, `N_Subjects`, `Conc_SD`. `Conc` and `Conc_SD` will be transformed according to `dose_norm` and `log10_trans`.
A `factor` vector which could be a new variable in `data`, giving the error group for each row in `data`.
Character or function: The name of the function that produces concentration predictions for the model being evaluated.
Logical: Whether to dose-normalize predicted and observed concentrations before calculating likelihood.
Logical: Whether to apply a [log10()] transformation to predicted and observed concentrations before calculating likelihood.
Logical: Whether to return the *negative* log-likelihood (i.e., the log-likelihood multiplied by negative 1). Default TRUE, to multiply the log-likelihood by negative 1 before returning it. This option is useful when treating the log-likelihood as an objective function to be *minimized* by an optimization algorithm.
Logical: Whether to force return of a finite value (e.g. as required by method `L-BFGS-B` in [optimx::optimx()]). Default FALSE. If TRUE, then if the log-likelihood works out to be non-finite, then it will be replaced with `.Machine$double.xmax`.
Numeric, but NULL by default. Determines which multiple of the maximum concentration to use to limit possible predictions. Note: only use this as part of the fitting function.
Logical: whether `data` includes predictions.
Logical.
Caroline Ring, Gilberto Padilla Mercado
The log-likelihood is formulated by assuming that residuals (transformed model-predicted concentrations minus transformed observed concentrations) are independent, and that groups of residuals obey zero-mean normal distributions where each group may have a separate error variance. Error groups are defined as unique combinations of variables in the harmonized data, by a command such as `pk(data = ...) + stat_error_model(error_group = vars(...)`.
# Log-likelihood equations
For chemical-species combination \(i\) and study \(j\), define the following quantities.
\(y_{ijk}\) is the \(k^{th}\) observation of concentration (which may be transformed, e.g. dose-normalized and/or log10-transformed), corresponding to dose \(d_{ijk}\) and time \(t_{ijk}\). Each observation has a corresponding LOQ, \(\textrm{LOQ}_{ijk}\).
For multiple-subject observations, \(y_{ijk}\) is the \(k^{th}\) *sample mean* observed concentration for chemical-species combination \(i\) and study \(j\), corresponding to dose \(d_{ijk}\) and time \(t_{ijk}\). It represents the mean of \(n_{ijk}\) individual measurements. It has a corresponding sample standard deviation, \(s_{ijk}\). In the harmonized data, \(s_{ijk}\) is contained in variable `Conc_SD`.
\(\bar{\theta}_i\) represents the vector of model parameters supplied in argument `params` for chemical-species combination \(i\).
\(\mu_{ijk}\) is the model-predicted concentration for dose \(d_{ijk}\) and time \(t_{ijk}\). If \(f(d, t; \bar{ \theta})\) is the model function evaluated at dose \(d\) and time \(t\), with parameter vector \(\bar{\theta}\), then
$$\mu_{ijk} = f \left(d_{ijk}, t_{ijk}; \bar{\theta}_i \right)$$
\(\sigma_{ij}^2\) is the study- and chemical-specific residual variance. (It is a hyperparameter.)
## Single-subject observations above limit of quantification (detects)
This is the normal probability density function evaluated at the observed concentration, as implemented in [stats::dnorm()].
$$LL_{ijk} = \log \left( \frac{1}{\sqrt{\sigma_{ij} 2 \pi}} \exp \left[ \frac{-1}{2} \left( \frac{y_{ijk} - \mu_{ijk}}{\sigma_{ij}} \right)^2 \right] \right) $$
## Single-subject observations below limit of quantification (non-detects)
This is the normal cumulative density function evaluated at the LOQ, as implemented in [stats::pnorm()]. It is the total probability of observing a concentration anywhere between 0 and the LOQ.
$$LL_{ijk} = \log \left( \frac{1}{2} \left[ 1 + \textrm{erf} \left( \frac{\textrm{LOQ}_{ijk} - \mu_{ijk}}{\sigma_{ij} \sqrt{2}} \right) \right] \right) $$
## Multiple-subject observations above limit of quantification
This is the joint log-likelihood across the multiple subjects included in one observation, re-expressed in terms of the sample mean, sample SD, and number of subjects for that observation. It is implemented in [dnorm_summary()].
$$LL_{ijk} = n_{ijk} * \log \left[ \frac{1}{\sigma_{ij} \sqrt{2 \pi}} + \frac{-1}{2 \sigma_{ij}^2} \left( \left( n_{ijk} - 1 \right) s_{ijk}^2 + n_{ijk} \left( y_{ijk} - \mu_{ijk} \right)^2 \right) \right]$$
## Multiple-subject observations below limit of quantification
This case is not implemented. If sample mean concentration is reported below LOQ, then it is unclear what individual observed concentrations are represented, and how they were combined to produce the summary data in terms of sample mean and sample SD. Were all individual observations below LOQ? Or were below-LOQ observations replaced with 0, LOQ/2, etc. before sample mean and sample SD were computed? If the sample mean is reported below LOQ, what LOQ is reported? Did individual observations all have the same LOQ, or is an average or median LOQ being used? It is impossible to formulate the log-likelihood without knowing the answers to these questions. Therefore, multiple-subject observations below LOQ are excluded from analysis (they are marked as excluded in [preprocess_data()]).
# Joint log-likelihood for a chemical and species
The joint log-likelihood for a chemical and species is simply the sum of log-likelihoods across observations.
$$LL_{i} = \sum_{j=1}^{J_i} \sum_{k=1}^{K_{ij}} LL_{ijk}$$
This is the overall probability of the observed data, given the model and parameters.