Neighborhood Cleaning Rule modifies the Edited Nearest Neighbor method by increasing the role of data cleaning.
Firstly, NCL removes negatives examples which are misclassified by their 3-nearest neighbors.
Secondly, the neighbors of each positive examples are found and the ones belonging to the majority class are removed.
Usage
ubNCL(X, Y, k = 3, verbose = TRUE)
Arguments
X
the input variables of the unbalanced dataset.
Y
the response variable of the unbalanced dataset.
It must be a binary factor where the majority class is coded as 0 and the minority as 1.
k
the number of neighbours to use
verbose
print extra information (TRUE/FALSE)
Value
The function returns a list:
X
input variables
Y
response variable
Details
In order to compute nearest neighbors, only numeric features are allowed.
References
J. Laurikkala. Improving identification of difficult small classes by balancing class distribution. Artificial Intelligence in Medicine, pages 63-66, 2001.