Learn R Programming

FRESA.CAD (version 2.0.2)

bootstrapValidationNeRI: Bootstrap validation of regression models

Description

This function bootstraps the model n times to estimate for each variable the empirical bootstrapped distribution of model coefficients, and net residual improvement (NeRI). At each bootstrap the non-observed data is predicted by the trained model, and statistics of the test prediction are stores and reported.

Usage

bootstrapValidationNeRI(fraction = 1,
	                        loops = 200,
	                        model.formula,
	                        Outcome,
	                        data,
	                        type = c("LM", "LOGIT", "COX"),
	                        plots = TRUE)

Arguments

fraction
The fraction of data (sampled with replacement) to be used as train
loops
The number of bootstrap loops
model.formula
An object of class formula with the formula to be used
Outcome
The name of the column in data that stores the variable to be predicted by the model
data
A data frame where all variables are stored in different columns
type
Fit type: Logistic ("LOGIT"), linear ("LM"), or Cox proportional hazards ("COX")
plots
Logical. If TRUE, density distribution plots are displayed

Value

  • dataThe data frame used to bootstrap and validate the model
  • outcomeA vector with the predictions made by the model
  • boot.modelAn object of class lm, glm, or coxph containing a model whose coefficients are the median of the coefficients of the bootstrapped models
  • NeRIsA matrix with the NeRI for each model term, estimated using the bootstrap test sets
  • tStudent.pvaluesA matrix with the t-test p-value of the NeRI for each model term, estimated using the bootstrap train sets
  • wilcox.pvaluesA matrix with the Wilcoxon rank-sum test p-value of the NeRI for each model term, estimated using the bootstrap train sets
  • bin.pvlauesA matrix with the binomial test p-value of the NeRI for each model term, estimated using the bootstrap train sets
  • F.pvlauesA matrix with the F-test p-value of the NeRI for each model term, estimated using the bootstrap train sets
  • test.tStudent.pvaluesA matrix with the t-test p-value of the NeRI for each model term, estimated using the bootstrap test sets
  • test.wilcox.pvaluesA matrix with the Wilcoxon rank-sum test p-value of the NeRI for each model term, estimated using the bootstrap test sets
  • test.bin.pvlauesA matrix with the binomial test p-value of the NeRI for each model term, estimated using the bootstrap test sets
  • test.F.pvlauesA matrix with the F-test p-value of the NeRI for each model term, estimated using the bootstrap test sets
  • testPredictionA vector that contains all the individual predictions used to validate the model in the bootstrap test sets
  • testOutcomeA vector that contains all the individual outcomes used to validate the model in the bootstrap test sets
  • testResidualsA vector that contains all the residuals used to validate the model in the bootstrap test sets
  • trainPredictionA vector that contains all the individual predictions used to validate the model in the bootstrap train sets
  • trainOutcomeA vector that contains all the individual outcomes used to validate the model in the bootstrap train sets
  • trainResidualsA vector that contains all the residuals used to validate the model in the bootstrap train sets
  • testRMSEThe global RMSE, estimated using the bootstrap test sets
  • trainRMSEThe global RMSE, estimated using the bootstrap train sets
  • trainSampleRMSEA vector with the RMSEs in the bootstrap train sets
  • testSampledRMSEA vector with the RMSEs in the bootstrap test sets

Details

The bootstrap validation will estimate the confidence interval of the model coefficients and the NeRI. It will also compute the train and blind test root-mean-square error (RMSE), as well as the distribution of the NeRI p-values.

See Also

bootstrapValidation, plot.bootstrapValidationNeRI

Examples

Run this code
# Start the graphics device driver to save all plots in a pdf format
	pdf(file = "Example.pdf")
	# Get the stage C prostate cancer data from the rpart package
	library(rpart)
	data(stagec)
	# Split the stages into several columns
	dataCancer <- cbind(stagec[,c(1:3,5:6)],
	                    gleason4 = 1*(stagec[,7] == 4),
	                    gleason5 = 1*(stagec[,7] == 5),
	                    gleason6 = 1*(stagec[,7] == 6),
	                    gleason7 = 1*(stagec[,7] == 7),
	                    gleason8 = 1*(stagec[,7] == 8),
	                    gleason910 = 1*(stagec[,7] >= 9),
	                    eet = 1*(stagec[,4] == 2),
	                    diploid = 1*(stagec[,8] == "diploid"),
	                    tetraploid = 1*(stagec[,8] == "tetraploid"),
	                    notAneuploid = 1-1*(stagec[,8] == "aneuploid"))
	# Remove the incomplete cases
	dataCancer <- dataCancer[complete.cases(dataCancer),]
	# Load a pre-stablished data frame with the names and descriptions of all variables
	data(cancerVarNames)
	# Get a Cox proportional hazards model using:
	# - 10 bootstrap loops
	# - Age as a covariate
	# - The Wilcoxon rank-sum test as the feature inclusion criterion
	cancerModel <- NeRIBasedFRESA.Model(loops = 10,
	                                    covariates = "1 + age",
	                                    Outcome = "pgstat",
	                                    variableList = cancerVarNames,
	                                    data = dataCancer,
	                                    type = "COX",
	                                    testType= "Wilcox",
	                                    timeOutcome = "pgtime")
	# Validate the previous model:
	# - Using 50 bootstrap loops
	bootCancerModel <- bootstrapValidationNeRI(loops = 50,
	                                           model.formula = cancerModel$formula,
	                                           Outcome = "pgstat",
	                                           data = dataCancer,
	                                           type = "COX")
	# Shut down the graphics device driver
	dev.off()

Run the code above in your browser using DataLab