getVarNeRI: Analysis of the effect of each term of a linear regression model by analyzing its residuals

Description

This function provides an analysis of the effect of each model term by comparing the residuals of the full model and the model without each term. The model is fitted using the train data set, but analysis of residual improvement is done on the train and test data sets. Residuals are compared by a paired t-test, a paired Wilcoxon rank-sum test, a binomial sign test and the F-test on residual variance. Additionally, the net residual improvement (NeRI) of each model term is reported.

Usage

getVarNeRI(object,
	           data,
	           Outcome = "Class",
	           type = c("LM", "LOGIT", "COX"),
	           testData = NULL)

Arguments

object

An object of class lm, glm, or coxph containing the model to be analyzed

data

A data frame where all variables are stored in different columns

Outcome

The name of the column in data that stores the variable to be predicted by the model

type

Fit type: Logistic ("LOGIT"), linear ("LM"), or Cox proportional hazards ("COX")

testData

A data frame similar to data, but with a data set to be independently tested. If NULL, data will be used.

Value

tP.valueA vector in which each element represents the single sided p-value of the paired t-test comparing the absolute values of the residuals obtained with the full model and the model without one term
BinP.valueA vector in which each element represents the p-value associated with a significant improvement in residuals according to the binomial sign test
WilcoxP.valueA vector in which each element represents the single sided p-value of the Wilcoxon rank-sum test comparing the absolute values of the residuals obtained with the full model and the model without one term
FP.valueA vector in which each element represents the single sided p-value of the F-test comparing the residual variances of the residuals obtained with the full model and the model without one term
NeRIsA vector in which each element represents the net residual improvement between the full model and the model without one term
testData.tP.valueA vector similar to tP.value, where values were estimated in testdata
testData.BinP.valueA vector similar to BinP.value, where values were estimated in testdata
testData.WilcoxP.valueA vector similar to WilcoxP.value, where values were estimated in testdata
testData.FP.valueA vector similar to FP.value, where values were estimated in testdata
testData.NeRIsA vector similar to NeRIs, where values were estimated in testdata

Examples

Run this code

# Start the graphics device driver to save all plots in a pdf format
	pdf(file = "Example.pdf")
	# Get the stage C prostate cancer data from the rpart package
	library(rpart)
	data(stagec)
	# Split the stages into several columns
	dataCancer <- cbind(stagec[,c(1:3,5:6)],
	                    gleason4 = 1*(stagec[,7] == 4),
	                    gleason5 = 1*(stagec[,7] == 5),
	                    gleason6 = 1*(stagec[,7] == 6),
	                    gleason7 = 1*(stagec[,7] == 7),
	                    gleason8 = 1*(stagec[,7] == 8),
	                    gleason910 = 1*(stagec[,7] >= 9),
	                    eet = 1*(stagec[,4] == 2),
	                    diploid = 1*(stagec[,8] == "diploid"),
	                    tetraploid = 1*(stagec[,8] == "tetraploid"),
	                    notAneuploid = 1-1*(stagec[,8] == "aneuploid"))
	# Remove the incomplete cases
	dataCancer <- dataCancer[complete.cases(dataCancer),]
	# Load a pre-stablished data frame with the names and descriptions of all variables
	data(cancerVarNames)
	# Split the data set into train and test samples
	trainDataCancer <- dataCancer[1:(nrow(dataCancer)/2),]
	testDataCancer <- dataCancer[(nrow(dataCancer)/2+1):nrow(dataCancer),]
	# Get a Cox proportional hazards model using:
	# - 10 bootstrap loops
	# - Train data
	# - Age as a covariate
	# - The Wilcoxon rank-sum test as the feature inclusion criterion
	cancerModel <- NeRIBasedFRESA.Model(loops = 10,
	                                    covariates = "1 + age",
	                                    Outcome = "pgstat",
	                                    variableList = cancerVarNames,
	                                    data = trainDataCancer,
	                                    type = "COX",
	                                    testType= "Wilcox",
	                                    timeOutcome = "pgtime")
	# Get the NeRI of each model term in the train data set and in the independent data set
	cancerModelNeRI <- getVarNeRI(object = cancerModel$final.model,
	                              data = testDataCancer,
	                              Outcome = "pgstat",
	                              type = "COX")
	# Shut down the graphics device driver
	dev.off()

Run the code above in your browser using DataLab

Description

Usage

Arguments

Value

See Also

Examples