Learn R Programming

FRESA.CAD (version 2.0.2)

bootstrapValidation: Bootstrap validation of binary classification models

Description

This function bootstraps the model n times to estimate for each variable the empirical distribution of model coefficients, area under ROC curve (AUC), integrated discrimination improvement (IDI) and net reclassification improvement (NRI). At each bootstrap the non-observed data is predicted by the trained model, and statistics of the test prediction are stored and reported. The method keeps track of predictions and plots the bootstrap-validated ROC. It may plots the blind test accuracy, sensitivity, and specificity, contrasted with the bootstrapped trained distributions.

Usage

bootstrapValidation(fraction = 1,
	                    loops = 200,
	                    model.formula,
	                    Outcome,
	                    data,
	                    type = c("LM", "LOGIT", "COX"),
	                    plots = TRUE)

Arguments

fraction
The fraction of data (sampled with replacement) to be used as train
loops
The number of bootstrap loops
model.formula
An object of class formula with the formula to be used
Outcome
The name of the column in data that stores the variable to be predicted by the model
data
A data frame where all variables are stored in different columns
type
Fit type: Logistic ("LOGIT"), linear ("LM"), or Cox proportional hazards ("COX")
plots
Logical. If TRUE, density distribution plots are displayed

Value

  • dataThe data frame used to bootstrap and validate the model
  • outcomeA vector with the predictions made by the model
  • blind.accuracyThe accuracy of the model in the blind test set
  • blind.sensitivityThe sensitivity of the model in the blind test set
  • blind.specificityThe specificity of the model in the blind test set
  • train.ROCAUCA vector with the AUC in the bootstrap train sets
  • blind.ROCAUCAn object of class roc containing the AUC in the bootstrap blind test set
  • boot.ROCAUCAn object of class roc containing the AUC using the mean of the bootstrapped coefficients
  • fractionThe fraction of data that was sampled with replacement
  • loopsThe number of loops it took for the model to stabilize
  • base.AccuracyThe accuracy of the original model
  • base.sensitivityThe sensitivity of the original model
  • base.specificityThe specificity of the original model
  • accuracyA vector with the accuracies in the bootstrap test sets
  • sensitivitiesA vector with the sensitivities in the bootstrap test sets
  • specificitiesA vector with the specificities in the bootstrap test sets
  • train.accuracyA vector with the accuracies in the bootstrap train sets
  • train.sensitivityA vector with the sensitivities in the bootstrap train sets
  • train.specificityA vector with the specificities in the bootstrap train sets
  • s.coefA matrix with the coefficients in the bootstrap train sets
  • boot.modelAn object of class lm, glm, or coxph containing a model whose coefficients are the median of the coefficients of the bootstrapped models
  • mboot.modelAn object of class lm, glm, or coxph containing a model whose coefficients are the IDI-weighted mean of the coefficients of the bootstrapped models
  • boot.accuracyThe accuracy of the mboot.model model
  • boot.sensitivityThe sensitivity of the mboot.model model
  • boot.specificityThe specificity of the mboot.model model
  • z.NRIsA matrix with the z-score of the NRI for each model term, estimated using the bootstrap train sets
  • z.IDIsA matrix with the z-score of the IDI for each model term, estimated using the bootstrap train sets
  • test.z.NRIsA matrix with the z-score of the NRI for each model term, estimated using the bootstrap test sets
  • test.z.IDIsA matrix with the z-score of the IDI for each model term, estimated using the bootstrap test sets
  • NRIsA matrix with the NRI for each model term, estimated using the bootstrap test sets
  • IDIsA matrix with the IDI for each model term, estimated using the bootstrap test sets
  • testOutcomeA vector that contains all the individual outcomes used to validate the model in the bootstrap test sets
  • testPredictionA vector that contains all the individual predictions used to validate the model in the bootstrap test sets

Details

The bootstrap validation will estimate the confidence interval of the model coefficients and the NRI and IDI. The non-sampled values will be used to estimate the blind accuracy, sensitivity, and specificity. A plot to monitor the evolution of the bootstrap procedure will be displayed if plots is set to TRUE. The plot shows the train and blind test ROC. The density distribution of the train accuracy, sensitivity, and specificity are also shown, with the blind test results drawn along the y-axis.

See Also

bootstrapValidationNeRI, plot.bootstrapValidation, summary.bootstrapValidation

Examples

Run this code
# Start the graphics device driver to save all plots in a pdf format
	pdf(file = "Example.pdf")
	# Get the stage C prostate cancer data from the rpart package
	library(rpart)
	data(stagec)
	# Split the stages into several columns
	dataCancer <- cbind(stagec[,c(1:3,5:6)],
	                    gleason4 = 1*(stagec[,7] == 4),
	                    gleason5 = 1*(stagec[,7] == 5),
	                    gleason6 = 1*(stagec[,7] == 6),
	                    gleason7 = 1*(stagec[,7] == 7),
	                    gleason8 = 1*(stagec[,7] == 8),
	                    gleason910 = 1*(stagec[,7] >= 9),
	                    eet = 1*(stagec[,4] == 2),
	                    diploid = 1*(stagec[,8] == "diploid"),
	                    tetraploid = 1*(stagec[,8] == "tetraploid"),
	                    notAneuploid = 1-1*(stagec[,8] == "aneuploid"))
	# Remove the incomplete cases
	dataCancer <- dataCancer[complete.cases(dataCancer),]
	# Load a pre-stablished data frame with the names and descriptions of all variables
	data(cancerVarNames)
	# Get a Cox proportional hazards model using:
	# - 10 bootstrap loops
	# - Age as a covariate
	# - zIDI as the feature inclusion criterion
	cancerModel <- ReclassificationFRESA.Model(loops = 10,
	                                           covariates = "1 + age",
	                                           Outcome = "pgstat",
	                                           variableList = cancerVarNames,
	                                           data = dataCancer,
	                                           type = "COX",
	                                           timeOutcome = "pgtime",
	                                           selectionType = "zIDI")
	# Validate the previous model:
	# - Using 50 bootstrap loops
	bootCancerModel <- bootstrapValidation(loops = 50,
	                                       model.formula = cancerModel$formula,
	                                       Outcome = "pgstat",
	                                       data = dataCancer,
	                                       type = "COX")
	# Shut down the graphics device driver
	dev.off()

Run the code above in your browser using DataLab