Score.list: Score risk predictions

Description

Methods to score the predictive performance of risk markers and risk prediction models

Usage

# S3 method for list
Score(object, formula, data, metrics = c("auc", "brier"),
  summary = NULL, plots = NULL, cause = 1, times, landmarks,
  useEventTimes = FALSE, nullModel = TRUE, conf.int = 0.95,
  contrasts = TRUE, probs = c(0, 0.25, 0.5, 0.75, 1),
  censMethod = "outside.ipcw", censModel = "cox", splitMethod, B, M, seed,
  trainseeds, ...)

Arguments

object

List of risk predictions (see details and examples).

formula

A formula which identifies the outcome (left hand side). E.g., Y ~ 1 for binary and Hist(time,status) ~ 1 for time-to-event outcome. In right censored data, the right hand side of the formula is used to estimate the inverse probability of censoring weights (IPCW) model.

data

data.frame or data.table in which the formula can be interpreted.

metrics

Character vector specifying which metrics to apply. Case does not matter. Choices are "AUC" and "Brier".

summary

Character vector specifying which summary statistics to apply to the predicted risks. The only choice is "riskQuantile" which enables (time-point specific) boxplots of predicted risks conditional on the outcome (at the time-point). Set to NULL to avoid estimation of retrospective risk quantiles.

plots

Character vector specifying which plots to prepare. Choices

cause

Event of interest. Used for binary outcome Y to specify that risks are risks of the event Y=event and for competing risks outcome to specify the cause of interest.

times

For survival and competing risks outcome: list of prediction horizons. All times which are greater than the maximal observed time in the data set are automatically removed. Note that the object returned by the function may become huge when the prediction performance is estimated at many prediction horizons.

landmarks

Not yet implemented.

useEventTimes

If TRUE merge all unique event times with the vector given by argument times.

nullModel

If TRUE fit a risk prediction model which ignores the covariates and predicts the same value for all subjects. The model is fitted using data and the left hand side of formula. For binary outcome this is just the empirical prevalence. For (right censored) time to event outcome, the null models are equal to the Kaplan-Meier estimator (no competing risks) and the Aalen-Johansen estimator (with competing risks).

conf.int

Either logical or a numeric value between 0 and 1. In right censored data, confidence intervals are based on Blanche et al (see references). Setting FALSE prevents the computation confidence intervals. TRUE means compute 95 percent confidence intervals and corresponding p-values for AUC and Brier score. If set to 0.87, the level of significance is 13 percent. So, do not set it to 0.87.

contrasts

Either logical or a list of contrasts. A list of contrasts defines which risk prediction models (markers) should be contrasted with respect to their prediction performance. If TRUE do all possible comparisons. For example, when object is a list with two risk prediction models and nullModel=TRUE setting TRUE is equivalent to list(c(0,1,2),c(1,2)) where c(0,1,2) codes for the two comparisons: 1 vs 0 and 2 vs 0 (positive integers refer to elements of object, 0 refers to the benchmark null model which ignores the covariates). This again is equivalent to explicitly setting list(c(0,1),c(0,2),c(1,2)). A more complex example: Suppose object has 7 elements and you want to do the following 3 comparisons: 6 vs 3, 2 vs 5 and 2 vs 3, you should set contrasts=c(6,3),c(2,5,3).

probs

Quantiles for retrospective summary statistics of the predicted risks. This affects the result of the function boxplot.Score.

censMethod

Method for dealing with right censored data. Either "outside.ipcw" or "inside.ipcw". "outside.ipcw" Here IPCW refers to inverse probability of censoring weights. The attributes inside and outside are only relevant for cross-validation (splitMethod) where the inside method computes the IPCW model in the the loop in each of the current test data sets whereas the outside method computes the IPCW model only once in the full data. Also, jackknife pseudo-values may become another option for all statistics. Right now they are only used for calibration curves.

censModel

Model for estimating inverse probability of censored weights. Implemented are the Kaplan-Meier method ("km") and Cox regression ("cox") both applied to the censored times. If the right hand side of formula does not specify covariates, the Kaplan-Meier method is used even if this argument is set to "cox".

splitMethod

Method for cross-validation. Right now the only choice is bootcv in which case bootstrap learning sets are drawn with our without replacement (argument M) from data. The data not included in the current bootstrap learning set are used as validation set to compute the prediction performance.

Number of bootstrap sets for cross-validation.

Size of subsamples for bootstrap cross-validation. If specified it has to be an integer smaller than the size of data.

seed

Super seed for setting training data seeds when randomly splitting (bootstrapping) the data during cross-validation.

trainseeds

Seeds for training models during cross-validation.

...

Not used

Value

List with scores and assessments of contrasts, i.e., tests and confidence limits for performance and difference in performance (AUC and Brier), summaries and plots. Most elements are indata.table format.

Details

The function implements toolbox for the risk prediction modeller: all tools work for the three outcomes: (1) binary (uncensored), (2) right censored time to event without competing risks, (3) right censored time to event with competing risks

Computed are the (time-dependent) Brier score and the (time-dependent) area under the ROC curve for a list of risk prediction models either in external validation data or in the learning data using bootstrap cross-validation. The function optionally provides results for plotting (time-point specific) ROC curves, for (time-point specific) calibration curves and for (time-point specific) retrospective boxplots.

For uncensored binary outcome the Delong-Delong test is used to contrast AUC of rival models in all other cases the p-values correspond to Wald tests based on standard errors obtained with an estimate of the influence function as described in detain in the appendix of Blanche et al. (2015).

References

Ulla B. Mogensen, Hemant Ishwaran, Thomas A. Gerds (2012). Evaluating Random Forests for Survival Analysis Using Prediction Error Curves. Journal of Statistical Software, 50(11), 1-23. URL http://www.jstatsoft.org/v50/i11/.

Paul Blanche, Cecile Proust-Lima, Lucie Loubere, Claudine Berr, Jean- Francois Dartigues, and Helene Jacqmin-Gadda. Quantifying and comparing dynamic predictive accuracy of joint models for longitudinal marker and time-to-event in presence of censoring and competing risks. Biometrics, 71 (1):102--113, 2015. P. Blanche, J-F Dartigues, and H. Jacqmin-Gadda. Estimating and comparing time-dependent areas under receiver operating characteristic curves for censored event times with competing risks. Statistics in Medicine, 32(30): 5381--5397, 2013.

E. Graf et al. (1999), Assessment and comparison of prognostic classification schemes for survival data. Statistics in Medicine, vol 18, pp= 2529--2545.

Efron, Tibshirani (1997) Journal of the American Statistical Association 92, 548--560 Improvement On Cross-Validation: The .632+ Bootstrap Method.

Gerds, Schumacher (2006), Consistent estimation of the expected Brier score in general survival models with right-censored event times. Biometrical Journal, vol 48, 1029--1040.

Thomas A. Gerds, Martin Schumacher (2007) Efron-Type Measures of Prediction Error for Survival Analysis Biometrics, 63(4), 1283--1287 doi:10.1111/j.1541-0420.2007.00832.x

Martin Schumacher, Harald Binder, and Thomas Gerds. Assessment of survival prediction models based on microarray data. Bioinformatics, 23(14):1768-74, 2007.

Mark A. van de Wiel, Johannes Berkhof, and Wessel N. van Wieringen Testing the prediction error difference between 2 predictors Biostatistics (2009) 10(3): 550-560 doi:10.1093/biostatistics/kxp011

Examples

Run this code

# binary outcome
library(lava)
set.seed(18)
learndat <- sampleData(48,outcome="binary")
testdat <- sampleData(40,outcome="binary")

# score logistic regression models
lr1 = glm(Y~X1+X2+X7+X9,data=learndat,family=binomial)
lr2 = glm(Y~X3+X5+X6,data=learndat,family=binomial)
Score(list("LR(X1+X2+X7+X9)"=lr1,"LR(X3+X5+X6)"=lr2),formula=Y~1,data=testdat)

xb=Score(list("LR(X1+X2+X7+X9)"=lr1,"LR(X3+X5+X6)"=lr2),formula=Y~1,
        data=testdat,plots=c("calibration","ROC"))
plotROC(xb)

# compute AUC for a list of continuous markers
markers = as.list(testdat[,.(X6,X7,X8,X9,X10)])
u=Score(markers,formula=Y~1,data=testdat,metrics=c("auc"))

# cross-validation

lr1a = glm(Y~X6,data=learndat,family=binomial)
lr2a = glm(Y~X7+X8+X9,data=learndat,family=binomial)
Score(list("LR1"=lr1a,"LR2"=lr2a),formula=Y~1,data=learndat,splitMethod="bootcv",B=3)

# survival outcome

# Score Cox regression models
library(survival)
library(rms)
library(prodlim)
set.seed(18)
trainSurv <- sampleData(100,outcome="survival")
testSurv <- sampleData(40,outcome="survival")
cox1 = coxph(Surv(time,event)~X1+X2+X7+X9,data=trainSurv, y=TRUE, x = TRUE)
cox2 = coxph(Surv(time,event)~X3+X5+X6,data=trainSurv, y=TRUE, x = TRUE)
xs=Score(list("Cox(X1+X2+X7+X9)"=cox1,"Cox(X3+X5+X6)"=cox2),
      formula=Surv(time,event)~1,data=testSurv,conf.int=FALSE,times=c(5,8))
xs

# time-dependent AUC for list of markers
survmarkers = as.list(testSurv[,.(X6,X7,X8,X9,X10)])
Score(survmarkers,
      formula=Surv(time,event)~1,metrics="auc",data=testSurv,
      conf.int=TRUE,times=c(5,8))

# compare models on test data
Score(list("Cox(X1+X2+X7+X9)"=cox1,"Cox(X3+X5+X6)"=cox2),
      formula=Surv(time,event)~1,data=testSurv,conf.int=TRUE,times=c(5,8))

# crossvalidation models in traindata

Score(list("Cox(X1+X2+X7+X9)"=cox1,"Cox(X3+X5+X6)"=cox2),
      formula=Surv(time,event)~1,data=trainSurv,conf.int=TRUE,times=c(5,8),
      splitMethod="bootcv",B=3)


# restrict number of comparisons

Score(list("Cox(X1+X2+X7+X9)"=cox1,"Cox(X3+X5+X6)"=cox2),
      formula=Surv(time,event)~1,data=trainSurv,contrasts=TRUE,
nullModel=FALSE,conf.int=TRUE,times=c(5,8),splitMethod="bootcv",B=3)

# competing risks outcome
set.seed(18)
trainCR <- sampleData(40,outcome="competing.risks")
testCR <- sampleData(40,outcome="competing.risks")
library(riskRegression)
library(cmprsk)
# Cause-specific Cox regression
csc1 = CSC(Hist(time,event)~X1+X2+X7+X9,data=trainCR)
csc2 = CSC(Hist(time,event)~X3+X5+X6,data=trainCR)
# Fine-Gray regression
fgr1 = FGR(Hist(time,event)~X1+X2+X7+X9,data=trainCR,cause=1)
fgr2 = FGR(Hist(time,event)~X3+X5+X6,data=trainCR,cause=1)
Score(list("CSC(X1+X2+X7+X9)"=csc1,"CSC(X3+X5+X6)"=csc2,
           "FGR(X1+X2+X7+X9)"=fgr1,"FGR(X3+X5+X6)"=fgr2),
      formula=Hist(time,event)~1,data=testCR,se.fit=TRUE,times=c(5,8))

Run the code above in your browser using DataLab