FixedDiscrDiscrIT: Investigates surrogacy for binary or ordinal outcomes using the Information Theoretic framework

Description

The function FixedDiscrDiscrIT uses the information theoretic approach (Alonso and Molenberghs 2007) to estimate trial and individual level surrogacy based on fixed-effects models when the surrogate is binary and the true outcome is ordinal, the converse case or when both outcomes are ordinal (the user must specify which form the data is in). The user can specify whether a weighted or unweighted analysis is required at the trial level. The penalized likelihood approach of Firth (1993) is applied to resolve issues of separation in discrete outcomes for particular trials. Requires packages OrdinalLogisticBiplot and logistf.

Usage

FixedDiscrDiscrIT(Dataset, Surr, True, Treat, Trial.ID,
Weighted = TRUE, Setting = c("binord"))

Value

An object of class FixedDiscrDiscrIT with components,

Trial.Spec.Results: A data.frame that contains the trial-specific intercepts and treatment effects for the surrogate and the true endpoints. Also, the number of observations per trial; whether the trial was able to be included in the analysis for both $R^2_{h}$ and $R^2_{ht}$; whether separation occurred and hence the penalized likelihood approach used for the surrogate or true outcome.
R2ht: A data.frame that contains the trial-level surrogacy estimate and its confidence interval.
R2h: A data.frame that contains the individual-level surrogacy estimate and its confidence interval.

Arguments

Dataset: A data.frame that should consist of one line per patient. Each line contains (at least) a surrogate value, a true outcome value, a treatment indicator and a trial ID.
Surr: The name of the variable in Dataset that contains the surrogate outcome values.
True: The name of the variable in Dataset that contains the true outcome values.
Treat: The name of the in Dataset that contains the treatment group values, 0/1 or -1/+1 are recommended.
Trial.ID: The name of the variable in Dataset that contains the trial ID to which the patient belongs.
Weighted: Logical. In practice it is often the case that different trials (or other clustering units) have different sample sizes. Univariate models are used to assess surrogacy in the information-theoretic approach, so it can be useful to adjust for heterogeneity in information content between the trial-specific contributions (particularly when trial-level surrogacy measures are of primary interest and when the heterogeneity in sample sizes is large). If Weighted=TRUE, weighted regression models are fitted. If Weighted=FALSE, unweighted regression analyses are conducted. See the Details section below. Default TRUE.
Setting: Specifies whether an ordinal or binary surrogate or true outcome are present in Dataset. Setting=c("binord") for a binary surrogate and ordinal true outcome, Setting=c("ordbin") for an ordinal surrogate and binary true outcome and Setting=c("ordord") where both outcomes are ordinal.

Author

Hannah M. Ensor & Christopher J. Weir

Details

Individual level surrogacy

The following univariate logistic regression models are fitted when Setting=c("ordbin"):

$$logit(P(T_{ij}=1))=\mu_{Ti}+\beta_{i}Z_{ij}, (1)$$ $$logit(P(T_{ij}=1|S_{ij}=s))=\gamma_{0i}+\gamma_{1i}Z_{ij}+\gamma_{2i}S_{ij}, (1)$$ where: $i$ and $j$ are the trial and subject indicators; $S_{ij}$ and $T_{ij}$ are the surrogate and true outcome values of subject $j$ in trial $i$; and $Z_{ij}$ is the treatment indicator for subject $j$ in trial $i$; $\mu_{Ti}$ and $\beta_{i}$ are the trial-specific intercepts and treatment-effects on the true endpoint in trial $i$; and $\gamma_{0i}$ and $\gamma_{1i}$ are the trial-specific intercepts and treatment-effects on the true endpoint in trial $i$ after accounting for the effect of the surrogate endpoint. The $-2$ log likelihood values of the previous models in each of the $i$ trials (i.e., $L_{1i}$ and $L_{2i}$, respectively) are subsequently used to compute individual-level surrogacy based on the so-called Likelihood Reduction Factor (LRF; for details, see Alonso & Molenberghs, 2006): $$R^2_{h}= 1 - \frac{1}{N} \sum_{i} exp \left(-\frac{L_{2i}-L_{1i}}{n_{i}} \right), $$ where $N$ is the number of trials and $n_{i}$ is the number of patients within trial $i$.

At the individual level in the discrete case $R^2_{h}$ is bounded above by a number strictly less than one and is re-scaled (see Alonso & Molenberghs (2007)): $$\widehat{R^2_{h}}= \frac{R^2_{h}}{1-e^{-2L_{0}}},$$ where $L_{0}$ is the log-likelihood of the intercept only model of the true outcome ($logit(P(T_{ij}=1)=\gamma_{3}$).

In the case of Setting=c("binord") or Setting=c("ordord") proportional odds models in (1) are used to accommodate the ordinal true response outcome, in all other respects the calculation of $R^2_{h}$ would proceed in the same manner.

Trial-level surrogacy

When Setting=c("ordbin") trial-level surrogacy is assessed by fitting the following univariate logistic regression and proportional odds models for the ordinal surrogate and binary true response variables regressed on treatment for each trial $i$: $$logit(P(S_{ij} \leq W))=\mu_{S_{wi}}+\alpha_{i}Z_{ij}, (2)$$ $$logit(P(T_{ij}=1))=\mu_{Ti}+\beta_{i}Z_{ij}, (2)$$ where: $i$ and $j$ are the trial and subject indicators; $S_{ij}$ and $T_{ij}$ are the surrogate and true outcome values of subject $j$ in trial $i$; $Z_{ij}$ is the treatment indicator for subject $j$ in trial $i$; $\mu_{S_{wi}}$ are the trial-specific intercept values for each cut point $w$, where $w=1,..,W-1$, of the ordinal surrogate outcome; $\mu_{Ti}$ are the fixed trial-specific intercepts for T; and $\alpha_{i}$ and $\beta_{i}$ are the fixed trial-specific treatment effects on S and T, respectively. The mean trial-specific intercepts for the surrogate are calculated, $\overline{\mu}_{S_{wi}}$.The following model is subsequently fitted:

$$\widehat{\beta}_{i}=\lambda_{0}+\lambda_{1}\widehat{\overline{\mu}}_{S_{wi}}+\lambda_{2}\widehat{\alpha}_{i}+\varepsilon_{i}, (3)$$

where the parameter estimates for $\beta_i$, $\overline{\mu}_{S_{wi}}$, and $\alpha_i$ are based on models (2) (see above). When a weighted model is requested (using the argument Weighted=TRUE in the function call), model (2) is a weighted regression model (with weights based on the number of observations in trial $i$). The $-2$ log likelihood value of the (weighted or unweighted) model (2) ($L_1$) is subsequently compared to the $-2$ log likelihood value of an intercept-only model ($\widehat{\beta}_{i}=\lambda_{3}$; $L_0$), and $R^2_{ht}$ is computed based on the Likelihood Reduction Factor (for details, see Alonso & Molenberghs, 2006):

$$R^2_{ht}= 1 - exp \left(-\frac{L_1-L_0}{N} \right),$$

where $N$ is the number of trials.

When separation (the presence of zero cells) occurs in the cross tabs of treatment and the true or surrogate outcome for a particular trial in models (2) extreme bias can occur in $R^2_{ht}$. Under separation there are no unique maximum likelihood for parameters $\beta_i$, $\overline{\mu}_{S_{wi}}$ and $\alpha_i$, in (2), for the affected trial $i$. This typically leads to extreme bias in the estimation of these parameters and hence outlying influential points in model (3), bias in $R^2_{ht}$ inevitably follows.

To resolve the issue of separation the penalized likelihood approach of Firth (1993) is applied. This approach adds an asymptotically negligible component to the score function to allow unbiased estimation of $\beta_i$, $\overline{\mu}_{S_{wi}}$, and $\alpha_i$ and in turn $R^2_{ht}$. The penalized likelihood R function logitf from the package of the same name is applied in the case of binary separation (Heinze and Schemper, 2002). The function pordlogistf from the package OrdinalLogisticBioplot is applied in the case of ordinal separation (Hern'andez, 2013). All instances of separation are reported.

In the case of Setting=c("binord") or Setting=c("ordord") the appropriate models (either logistic regression or a proportional odds models) are fitted in (2) to accommodate the form (either binary or ordinal) of the true or surrogate response variable. The rest of the analysis would proceed in a similar manner as that described above.

References

Alonso, A, & Molenberghs, G. (2007). Surrogate marker evaluation from an information theory perspective. Biometrics, 63, 180-186.

Alonso, A, & Molenberghs, G., Geys, H., Buyse, M. & Vangeneugden, T. (2006). A unifying approach for surrogate marker validation based on Prentice's criteria. Statistics in medicine, 25, 205-221.

Firth, D. (1993). Bias reduction of maximum likelihood estimates. Biometrika, 80, 27-38.

Heinze, G. & Schemper, M. 2002. A solution to the problem of separation in logistic regression. Statistics in medicine, 21, 2409-2419.

Hern'andez, J. C. V.-V. O., J. L. 2013. OrdinalLogisticBiplot: Biplot representations of ordinal variables. R.

Examples

Run this code


if (FALSE)  # Time consuming (>5sec) code part
# Example 1
# Conduct an analysis based on a simulated dataset with 2000 patients, 100 trials,
# and Rindiv=Rtrial=.8

# Simulate the data:
Sim.Data.MTS(N.Total=2000, N.Trial=100, R.Trial.Target=.8, R.Indiv.Target=.8,
Seed=123, Model="Full")

# create a binary true and ordinal surrogate outcome
Data.Observed.MTS$True<-findInterval(Data.Observed.MTS$True,
c(quantile(Data.Observed.MTS$True,0.5)))
Data.Observed.MTS$Surr<-findInterval(Data.Observed.MTS$Surr,
c(quantile(Data.Observed.MTS$Surr,0.333),quantile(Data.Observed.MTS$Surr,0.666)))

# Assess surrogacy based on a full fixed-effect model
# in the information-theoretic framework for a binary surrogate and ordinal true outcome:
SurEval <- FixedDiscrDiscrIT(Dataset=Data.Observed.MTS, Surr=Surr, True=True, Treat=Treat,
Trial.ID=Trial.ID, Setting="ordbin")

# Show a summary of the results:
summary(SurEval)
SurEval$Trial.Spec.Results
SurEval$R2h
SurEval$R2ht

Run the code above in your browser using DataLab