Learn R Programming

FRESA.CAD (version 2.0.2)

bootstrapVarNeRIElimination: NeRI-based backwards variable elimination with bootstrapping

Description

This function removes model terms that do not improve the bootstrapped net residual improvement (NeRI) significantly.

Usage

bootstrapVarNeRIElimination(object,
	                            pvalue = 0.05,
	                            Outcome = "Class",
	                            data,
	                            startOffset = 0, 
	                            type = c("LOGIT", "LM", "COX"),
	                            testType = c("Binomial",
	                                         "Wilcox",
	                                         "tStudent",
	                                         "Ftest"),
	                            loops = 250,
	                            fraction = 1.0,
	                            setIntersect = 1,
	                            print=TRUE,
	                            plots=TRUE)

Arguments

object
An object of class lm, glm, or coxph containing the model to be analyzed
pvalue
The maximum p-value, associated to the NeRI, allowed for a term in the model
Outcome
The name of the column in data that stores the variable to be predicted by the model
data
A data frame where all variables are stored in different columns
startOffset
Only terms whose position in the model is larger than the startOffset are candidates to be removed
type
Fit type: Logistic ("LOGIT"), linear ("LM"), or Cox proportional hazards ("COX")
testType
Type of non-parametric test to be evaluated by the improvedResiduals function: Binomial test ("Binomial"), Wilcoxon rank-sum test ("Wilcox"), Student's t-test ("tStudent"), or F-test ("Ftest")
loops
The number of bootstrap loops
fraction
The fraction of data (sampled with replacement) to be used as train
setIntersect
The intersect of the model (To force a zero intersect, set this value to 0)
print
Logical. If TRUE, information will be displayed
plots
Logical. If TRUE, plots are displayed

Value

  • back.modelAn object of the same class as object containing the reduced model
  • loopsThe number of loops it took for the model to stabilize
  • reclas.infoA list with the NeRI statistics of the reduced model, as given by the getVarNeRI function
  • bootCVAn object of class bootstrapValidationNeRI containing the results of the bootstrap validation in the reduced model
  • back.formulaAn object of class formula with the formula used to fit the reduced model
  • lastRemovedThe name of the last term that was removed (-1 if all terms were removed)

Details

For each model term $x_i$, the residuals are computed for the full model and the reduced model( where the term $x_i$ removed). The term whose removal results in the smallest drop in bootstrapped residuals improvement is selected. The hypothesis: the term improves residuals is tested by checking the pvalue of average improvement. If $p(residuals better than reduced residuals)>pvalue$, then the term is removed. In other words, only model terms that significantly aid in improving residuals are kept. The procedure is repeated until no term fulfils the removal criterion. The p-values of improvement can be computed via a sign-test (Binomial) a paired Wilcoxon test, paired t-test or f-test. The first three tests compare the absolute values of the residuals, while the f-test test if the variance of the residuals is improved significantly.

See Also

bootstrapVarElimination, backVarNeRIElimination, bootstrapValidationNeRI

Examples

Run this code
# Start the graphics device driver to save all plots in a pdf format
	pdf(file = "Example.pdf")
	# Get the stage C prostate cancer data from the rpart package
	library(rpart)
	data(stagec)
	# Split the stages into several columns
	dataCancer <- cbind(stagec[,c(1:3,5:6)],
	                    gleason4 = 1*(stagec[,7] == 4),
	                    gleason5 = 1*(stagec[,7] == 5),
	                    gleason6 = 1*(stagec[,7] == 6),
	                    gleason7 = 1*(stagec[,7] == 7),
	                    gleason8 = 1*(stagec[,7] == 8),
	                    gleason910 = 1*(stagec[,7] >= 9),
	                    eet = 1*(stagec[,4] == 2),
	                    diploid = 1*(stagec[,8] == "diploid"),
	                    tetraploid = 1*(stagec[,8] == "tetraploid"),
	                    notAneuploid = 1-1*(stagec[,8] == "aneuploid"))
	# Remove the incomplete cases
	dataCancer <- dataCancer[complete.cases(dataCancer),]
	# Load a pre-stablished data frame with the names and descriptions of all variables
	data(cancerVarNames)
	# Get a Cox proportional hazards model using:
	# - A lax p-value
	# - 10 bootstrap loops
	# - Age as a covariate
	# - The Wilcoxon rank-sum test as the feature inclusion criterion
	cancerModel <- NeRIBasedFRESA.Model(pvalue = 0.1,
	                                    loops = 10,
	                                    covariates = "1 + age",
	                                    Outcome = "pgstat",
	                                    variableList = cancerVarNames,
	                                    data = dataCancer,
	                                    type = "COX",
	                                    testType= "Wilcox",
	                                    timeOutcome = "pgtime")
	# Remove not significant variables from the previous model:
	# - Using a strict p-value
	# - Excluding the covariate as a candidate for feature removal 
	# - Using the Wilcoxon rank-sum test as the feature removal criterion
	# - Using 50 bootstrap loops
	reducedCancerModel <- bootstrapVarNeRIElimination(object = cancerModel$final.model,
	                                                  pvalue = 0.005,
	                                                  Outcome = "pgstat",
	                                                  data = dataCancer,
	                                                  startOffset = 1,
	                                                  type = "COX",
	                                                  testType = "Wilcox",
	                                                  loops = 50,
	                                                  fraction = 1)
	# Shut down the graphics device driver
	dev.off()

Run the code above in your browser using DataLab