Learn R Programming

FRESA.CAD (version 2.0.2)

getKNNpredictionFromFormula: Predict classification using KNN

Description

This function will return the classification of the samples of a test set using a k-nearest neighbors (KNN) algorithm with euclidean distances, given a formula and a train set.

Usage

getKNNpredictionFromFormula(model.formula,
	                            trainData,
	                            testData,
	                            Outcome = "CLASS",
	                            nk = 3)

Arguments

model.formula
An object of class formula with the formula to be used
trainData
A data frame with the data to train the model, where all variables are stored in different columns
testData
A data frame similar to trainData, but with the data set to be predicted
Outcome
The name of the column in trainData that stores the variable to be predicted by the model
nk
The number of neighbors used to generate the KNN classification

Value

  • predictionA vector with the predicted outcome for the testData data set
  • probThe proportion of k neighbours that predicted the class to be the one being reported in prediction
  • binProbThe proportion of k neighbours that predicted the class of the outcome to be equal to 1
  • featureListA vector with the names of the features used by the KNN procedure

See Also

predictForFresa, knn

Examples

Run this code
# Start the graphics device driver to save all plots in a pdf format
	pdf(file = "Example.pdf")
	# Get the stage C prostate cancer data from the rpart package
	library(rpart)
	data(stagec)
	# Split the stages into several columns
	dataCancer <- cbind(stagec[,c(1:3,5:6)],
	                    gleason4 = 1*(stagec[,7] == 4),
	                    gleason5 = 1*(stagec[,7] == 5),
	                    gleason6 = 1*(stagec[,7] == 6),
	                    gleason7 = 1*(stagec[,7] == 7),
	                    gleason8 = 1*(stagec[,7] == 8),
	                    gleason910 = 1*(stagec[,7] >= 9),
	                    eet = 1*(stagec[,4] == 2),
	                    diploid = 1*(stagec[,8] == "diploid"),
	                    tetraploid = 1*(stagec[,8] == "tetraploid"),
	                    notAneuploid = 1-1*(stagec[,8] == "aneuploid"))
	# Remove the incomplete cases
	dataCancer <- dataCancer[complete.cases(dataCancer),]
	# Load a pre-stablished data frame with the names and descriptions of all variables
	data(cancerVarNames)
	# Split the data set into train and test samples
	trainDataCancer <- dataCancer[1:(nrow(dataCancer)/2),]
	testDataCancer <- dataCancer[(nrow(dataCancer)/2+1):nrow(dataCancer),]
	# Get a Cox proportional hazards model using:
	# - 10 bootstrap loops
	# - Train data
	# - Age as a covariate
	# - zIDI as the feature inclusion criterion
	cancerModel <- ReclassificationFRESA.Model(loops = 10,
	                                           covariates = "1 + age",
	                                           Outcome = "pgstat",
	                                           variableList = cancerVarNames,
	                                           data = trainDataCancer,
	                                           type = "COX",
	                                           timeOutcome = "pgtime",
	                                           selectionType = "zIDI")
	# Predict the outcome of the test data sample using KNN
	KNNPrediction <- getKNNpredictionFromFormula(model.formula = cancerModel$formula,
	                                             trainData = trainDataCancer,
	                                             testData = testDataCancer,
	                                             Outcome = "pgstat",
	                                             nk = 5)
	# Shut down the graphics device driver
	dev.off()

Run the code above in your browser using DataLab