bootstrapValidationNeRI: Bootstrap validation of regression models

Description

This function bootstraps the model n times to estimate for each variable the empirical bootstrapped distribution of model coefficients, and net residual improvement (NeRI). At each bootstrap the non-observed data is predicted by the trained model, and statistics of the test prediction are stores and reported.

Usage

bootstrapValidationNeRI(fraction = 1,
	                        loops = 200,
	                        model.formula,
	                        Outcome,
	                        data,
	                        type = c("LM", "LOGIT", "COX"),
	                        plots = TRUE)

Arguments

fraction

The fraction of data (sampled with replacement) to be used as train

loops

The number of bootstrap loops

model.formula

An object of class formula with the formula to be used

Outcome

The name of the column in data that stores the variable to be predicted by the model

data

A data frame where all variables are stored in different columns

type

Fit type: Logistic ("LOGIT"), linear ("LM"), or Cox proportional hazards ("COX")

plots

Logical. If TRUE, density distribution plots are displayed

Value

dataThe data frame used to bootstrap and validate the model
outcomeA vector with the predictions made by the model
boot.modelAn object of class lm, glm, or coxph containing a model whose coefficients are the median of the coefficients of the bootstrapped models
NeRIsA matrix with the NeRI for each model term, estimated using the bootstrap test sets
tStudent.pvaluesA matrix with the t-test p-value of the NeRI for each model term, estimated using the bootstrap train sets
wilcox.pvaluesA matrix with the Wilcoxon rank-sum test p-value of the NeRI for each model term, estimated using the bootstrap train sets
bin.pvlauesA matrix with the binomial test p-value of the NeRI for each model term, estimated using the bootstrap train sets
F.pvlauesA matrix with the F-test p-value of the NeRI for each model term, estimated using the bootstrap train sets
test.tStudent.pvaluesA matrix with the t-test p-value of the NeRI for each model term, estimated using the bootstrap test sets
test.wilcox.pvaluesA matrix with the Wilcoxon rank-sum test p-value of the NeRI for each model term, estimated using the bootstrap test sets
test.bin.pvlauesA matrix with the binomial test p-value of the NeRI for each model term, estimated using the bootstrap test sets
test.F.pvlauesA matrix with the F-test p-value of the NeRI for each model term, estimated using the bootstrap test sets
testPredictionA vector that contains all the individual predictions used to validate the model in the bootstrap test sets
testOutcomeA vector that contains all the individual outcomes used to validate the model in the bootstrap test sets
testResidualsA vector that contains all the residuals used to validate the model in the bootstrap test sets
trainPredictionA vector that contains all the individual predictions used to validate the model in the bootstrap train sets
trainOutcomeA vector that contains all the individual outcomes used to validate the model in the bootstrap train sets
trainResidualsA vector that contains all the residuals used to validate the model in the bootstrap train sets
testRMSEThe global RMSE, estimated using the bootstrap test sets
trainRMSEThe global RMSE, estimated using the bootstrap train sets
trainSampleRMSEA vector with the RMSEs in the bootstrap train sets
testSampledRMSEA vector with the RMSEs in the bootstrap test sets

Details

The bootstrap validation will estimate the confidence interval of the model coefficients and the NeRI. It will also compute the train and blind test root-mean-square error (RMSE), as well as the distribution of the NeRI p-values.

Examples

Run this code

# Start the graphics device driver to save all plots in a pdf format
	pdf(file = "Example.pdf")
	# Get the stage C prostate cancer data from the rpart package
	library(rpart)
	data(stagec)
	# Split the stages into several columns
	dataCancer <- cbind(stagec[,c(1:3,5:6)],
	                    gleason4 = 1*(stagec[,7] == 4),
	                    gleason5 = 1*(stagec[,7] == 5),
	                    gleason6 = 1*(stagec[,7] == 6),
	                    gleason7 = 1*(stagec[,7] == 7),
	                    gleason8 = 1*(stagec[,7] == 8),
	                    gleason910 = 1*(stagec[,7] >= 9),
	                    eet = 1*(stagec[,4] == 2),
	                    diploid = 1*(stagec[,8] == "diploid"),
	                    tetraploid = 1*(stagec[,8] == "tetraploid"),
	                    notAneuploid = 1-1*(stagec[,8] == "aneuploid"))
	# Remove the incomplete cases
	dataCancer <- dataCancer[complete.cases(dataCancer),]
	# Load a pre-stablished data frame with the names and descriptions of all variables
	data(cancerVarNames)
	# Get a Cox proportional hazards model using:
	# - 10 bootstrap loops
	# - Age as a covariate
	# - The Wilcoxon rank-sum test as the feature inclusion criterion
	cancerModel <- NeRIBasedFRESA.Model(loops = 10,
	                                    covariates = "1 + age",
	                                    Outcome = "pgstat",
	                                    variableList = cancerVarNames,
	                                    data = dataCancer,
	                                    type = "COX",
	                                    testType= "Wilcox",
	                                    timeOutcome = "pgtime")
	# Validate the previous model:
	# - Using 50 bootstrap loops
	bootCancerModel <- bootstrapValidationNeRI(loops = 50,
	                                           model.formula = cancerModel$formula,
	                                           Outcome = "pgstat",
	                                           data = dataCancer,
	                                           type = "COX")
	# Shut down the graphics device driver
	dev.off()