fit_copula_OrdCont()
fits the ordinal-continuous vine copula model. See
Details for more information about this model.
fit_copula_OrdCont(
data,
copula_family,
marginal_S0,
marginal_S1,
K_T,
start_copula,
method = "BFGS",
...
)
Returns an S3 object that can be used to perform the sensitivity
analysis with sensitivity_analysis_copula()
.
data frame with three columns in the following order: surrogate
endpoint, true endpoint, and treatment indicator (0/1 coding). Ordinal endpoints
should be integers starting from 1
.
One of the following parametric copula families:
"clayton"
, "frank"
, "gaussian"
, or "gumbel"
. The first element in
copula_family
corresponds to the control group, the second to the
experimental group.
List with the following three elements (in order):
Density function with first argument x
and second argument para
the parameter
vector for this distribution.
Distribution function with first argument x
and second argument para
the parameter
vector for this distribution.
Inverse distribution function with first argument p
and second argument para
the parameter
vector for this distribution.
The number of elements in para
.
A vector of starting values for para
.
Number of categories in the true endpoint.
Starting value for the copula parameter.
Optimization algorithm for maximizing the objective function.
For all options, see ?maxLik::maxLik
. Defaults to "BFGS"
.
Arguments passed on to fit_copula_submodel_OrdCont
names_XY
Names for X
and Y
, respectively.
twostep
(boolean) If TRUE
, the starting values are fixed for the
marginal distributions and only the copula parameter is estimated.
start_Y
Starting values for the marginal distribution paramters for Y
.
X
First variable (Ordinal with \(K\) categories)
Y
Second variable (Continuous)
K
Number of categories in X
.
marginal_Y
List with the following five elements (in order):
Density function with first argument x
and second argument para
the parameter
vector for this distribution.
Distribution function with first argument x
and second argument para
.
Inverse distribution function with first argument p
and second argument para
.
The number of elements in para
.
Starting values for para
.
Florian Stijven
Following the Neyman-Rubin potential outcomes framework, we assume that each
patient has four potential outcomes, two for each arm, represented by
\(\boldsymbol{Y} = (T_0, S_0, S_1, T_1)'\). Here, \(\boldsymbol{Y_z} =
(S_z, T_z)'\) are the potential surrogate and true endpoints under treatment
\(Z = z\). We will further assume that \(T\) is ordinal and \(S\) is
continuous; consequently, the function argument X
corresponds to \(T\) and
Y
to \(S\). (The roles of \(S\) and \(T\) can be interchanged without
loss of generality.)
We introduce latent variables to model \(\boldsymbol{Y}\). Latent variables will be denoted by a tilde. For instance, if \(T_z\) is ordinal with \(K_T\) categories, then \(T_z\) is a function of the latent \(\tilde{T}_z \sim N(0, 1)\) as follows: $$ T_z = g_{T_z}(\tilde{T}_z; \boldsymbol{c}^{T_z}) = \begin{cases} 1 & \text{ if } -\infty = c_0^{T_z} < \tilde{T_z} \le c_1^{T_z} \\ \vdots \\ k & \text{ if } c_{k - 1}^{T_z} < \tilde{T_z} \le c_k^{T_z} \\ \vdots \\ K & \text{ if } c_{K_{T} - 1}^{T_z} < \tilde{T_z} \le c_{K_{T}}^{T_z} = \infty, \\ \end{cases} $$ where \(\boldsymbol{c}^{T_z} = (c_1^{T_z}, \cdots, c_{K_T - 1}^{T_z})\). The latent counterpart of \(\boldsymbol{Y}\) is again denoted by a tilde; for example, \(\tilde{\boldsymbol{Y}} = (\tilde{T}_0, S_0, S_1, \tilde{T}_1)'\) if \(T_z\) is ordinal and \(S_z\) is continuous.
The vector of latent potential outcome \(\tilde{\boldsymbol{Y}}\) is modeled with a D-vine copula as follows: $$ f_{\tilde{\boldsymbol{Y}}} = f_{\tilde{T}_0} \, f_{S_0} \, f_{S_1} \, f_{\tilde{T}_1} \cdot c_{\tilde{T}_0, S_0 } \, c_{S_0, S_1} \, c_{S_1, \tilde{T}_1} \cdot c_{\tilde{T}_0, S_1; S_0} \, c_{S_0, \tilde{T}_1; S_1} \cdot c_{\tilde{T}_0, \tilde{T}_1; S_0, S_1}, $$ where (i) \(f_{T_0}\), \(f_{S_0}\), \(f_{S_1}\), and \(f_{T_1}\) are univariate density functions, (ii) \(c_{T_0, S_0}\), \(c_{S_0, S_1}\), and \(c_{S_1, T_1}\) are unconditional bivariate copula densities, and (iii) \(c_{T_0, S_1; S_0}\), \(c_{S_0, T_1; S_1}\), and \(c_{T_0, T_1; S_0, S_1}\) are conditional bivariate copula densities (e.g., \(c_{T_0, S_1; S_0}\) is the copula density of \((T_0, S_1)' \mid S_0\). We also make the simplifying assumption for all copulas.
In practice, we only observe \((S_0, T_0)'\) or \((S_1, T_1)'\). Hence, to
estimate the (identifiable) parameters of the D-vine copula model, we need
to derive the observed-data likelihood. The observed-data loglikelihood for
\((S_z, T_z)'\) is as follows:
$$
f_{\boldsymbol{Y_z}}(s, t; \boldsymbol{\beta}) =
\int_{c^{T_z}_{t - 1}}^{+ \infty} f_{\boldsymbol{\tilde{Y}_z}}(s, x; \boldsymbol{\beta}) \, dx - \int_{c^{T_z}_{t}}^{+ \infty} f_{\boldsymbol{\tilde{Y}_z}}(s, x; \boldsymbol{\beta}) \, dx.
$$
The above expression is used in ordinal_continuous_loglik()
to compute the
loglikelihood for the observed values for \(Z = 0\) or \(Z = 1\). In this
function, X
and Y
correspond to \(T_z\) and \(S_z\) if \(T_z\) is
ordinal and \(S_z\) continuous. Otherwise, X
and Y
correspond to
\(S_z\) and \(T_z\).
sensitivity_analysis_copula()
, print.vine_copula_fit()
,
plot.vine_copula_fit()