ordinal_continuous_loglik: Loglikelihood function for ordinal-continuous copula model

Description

ordinal_continuous_loglik() computes the observed-data loglikelihood for a bivariate copula model with a continuous and an ordinal endpoint. The model is based on a latent variable representation of the ordinal endpoint.

Usage

ordinal_continuous_loglik(
  para,
  X,
  Y,
  copula_family,
  marginal_Y,
  K,
  return_sum = TRUE
)

Value

(numeric) loglikelihood value evaluated in para.

Arguments

para

Parameter vector. The parameters are ordered as follows:

para[1:p1]: Cutpoints for the latent distribution of X corresponding to $c_1, \dots, c_{K - 1}$ (see Details).
para[(p1 + 1):(p1 + p2)]: Parameters for surrogate distribution, more details in ?Surrogate::cdf_fun for the specific implementations.
para[p1 + p2 + 1]: copula parameter

X

First variable (Ordinal with $K$ categories)

Y

Second variable (Continuous)

copula_family

Copula family, one of the following:

"clayton"
"frank"
"gumbel"
"gaussian"

marginal_Y

List with the following five elements (in order):

Density function with first argument x and second argument para the parameter vector for this distribution.
Distribution function with first argument x and second argument para.
Inverse distribution function with first argument p and second argument para.
The number of elements in para.
Starting values for para.

K

Number of categories in X.

return_sum

Return the sum of the individual loglikelihoods? If FALSE, a vector with the individual loglikelihood contributions is returned.

Details

Vine Copula Model for Ordinal Endpoints

Following the Neyman-Rubin potential outcomes framework, we assume that each patient has four potential outcomes, two for each arm, represented by $\boldsymbol{Y} = (T_0, S_0, S_1, T_1)'$. Here, $\boldsymbol{Y_z} = (S_z, T_z)'$ are the potential surrogate and true endpoints under treatment $Z = z$. We will further assume that $T$ is ordinal and $S$ is continuous; consequently, the function argument X corresponds to $T$ and Y to $S$. (The roles of $S$ and $T$ can be interchanged without loss of generality.)

We introduce latent variables to model $\boldsymbol{Y}$. Latent variables will be denoted by a tilde. For instance, if $T_z$ is ordinal with $K_T$ categories, then $T_z$ is a function of the latent $\tilde{T}_z \sim N(0, 1)$ as follows: $$ T_z = g_{T_z}(\tilde{T}_z; \boldsymbol{c}^{T_z}) = \begin{cases} 1 & \text{ if } -\infty = c_0^{T_z} < \tilde{T_z} \le c_1^{T_z} \\ \vdots \\ k & \text{ if } c_{k - 1}^{T_z} < \tilde{T_z} \le c_k^{T_z} \\ \vdots \\ K & \text{ if } c_{K_{T} - 1}^{T_z} < \tilde{T_z} \le c_{K_{T}}^{T_z} = \infty, \\ \end{cases} $$ where $\boldsymbol{c}^{T_z} = (c_1^{T_z}, \cdots, c_{K_T - 1}^{T_z})$. The latent counterpart of $\boldsymbol{Y}$ is again denoted by a tilde; for example, $\tilde{\boldsymbol{Y}} = (\tilde{T}_0, S_0, S_1, \tilde{T}_1)'$ if $T_z$ is ordinal and $S_z$ is continuous.

The vector of latent potential outcome $\tilde{\boldsymbol{Y}}$ is modeled with a D-vine copula as follows: $$ f_{\tilde{\boldsymbol{Y}}} = f_{\tilde{T}_0} \, f_{S_0} \, f_{S_1} \, f_{\tilde{T}_1} \cdot c_{\tilde{T}_0, S_0 } \, c_{S_0, S_1} \, c_{S_1, \tilde{T}_1} \cdot c_{\tilde{T}_0, S_1; S_0} \, c_{S_0, \tilde{T}_1; S_1} \cdot c_{\tilde{T}_0, \tilde{T}_1; S_0, S_1}, $$ where (i) $f_{T_0}$, $f_{S_0}$, $f_{S_1}$, and $f_{T_1}$ are univariate density functions, (ii) $c_{T_0, S_0}$, $c_{S_0, S_1}$, and $c_{S_1, T_1}$ are unconditional bivariate copula densities, and (iii) $c_{T_0, S_1; S_0}$, $c_{S_0, T_1; S_1}$, and $c_{T_0, T_1; S_0, S_1}$ are conditional bivariate copula densities (e.g., $c_{T_0, S_1; S_0}$ is the copula density of $(T_0, S_1)' \mid S_0$. We also make the simplifying assumption for all copulas.

Observed-Data Likelihood

In practice, we only observe $(S_0, T_0)'$ or $(S_1, T_1)'$. Hence, to estimate the (identifiable) parameters of the D-vine copula model, we need to derive the observed-data likelihood. The observed-data loglikelihood for $(S_z, T_z)'$ is as follows: $$ f_{\boldsymbol{Y_z}}(s, t; \boldsymbol{\beta}) = \int_{c^{T_z}_{t - 1}}^{+ \infty} f_{\boldsymbol{\tilde{Y}_z}}(s, x; \boldsymbol{\beta}) \, dx - \int_{c^{T_z}_{t}}^{+ \infty} f_{\boldsymbol{\tilde{Y}_z}}(s, x; \boldsymbol{\beta}) \, dx. $$ The above expression is used in ordinal_continuous_loglik() to compute the loglikelihood for the observed values for $Z = 0$ or $Z = 1$. In this function, X and Y correspond to $T_z$ and $S_z$ if $T_z$ is ordinal and $S_z$ continuous. Otherwise, X and Y correspond to $S_z$ and $T_z$.