Learn R Programming

SMOTEWB (version 1.2.0)

BLSMOTE: Borderline Synthetic Minority Oversampling Technique

Description

BLSMOTE() applies BLSMOTE (Borderline-SMOTE) which is a variation of the SMOTE algorithm that generates synthetic samples only in the vicinity of the borderline instances in imbalanced datasets.

Usage

BLSMOTE(x, y, k1 = 5, k2 = 5, type = "type1")

Value

a list with resampled dataset.

x_new

Resampled feature matrix.

y_new

Resampled target variable.

x_syn

Generated synthetic data.

C

Number of synthetic samples for each positive class samples.

Arguments

x

feature matrix or data.frame.

y

a factor class variable with two classes.

k1

number of neighbors to link. Default is 5.

k2

number of neighbors to determine safe levels. Default is 5.

type

"type1" or "type2". Default is "type1".

Author

Fatih Saglam, saglamf89@gmail.com

Details

BLSMOTE works by focusing on the instances that are near the decision boundary between the minority and majority classes, known as borderline instances. These instances are more informative and potentially more challenging for classification, and thus generating synthetic samples in their vicinity can be more effective than generating them randomly.

Note: Much faster than smotefamily::BLSMOTE().

References

Han, H., Wang, W. Y., & Mao, B. H. (2005). Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In Advances in Intelligent Computing: International Conference on Intelligent Computing, ICIC 2005, Hefei, China, August 23-26, 2005, Proceedings, Part I 1 (pp. 878-887). Springer Berlin Heidelberg.

Examples

Run this code

set.seed(1)
x <- rbind(matrix(rnorm(2000, 3, 1), ncol = 2, nrow = 1000),
           matrix(rnorm(100, 5, 1), ncol = 2, nrow = 50))
y <- as.factor(c(rep("negative", 1000), rep("positive", 50)))

plot(x, col = y)

# resampling
m <- BLSMOTE(x = x, y = y, k1 = 5, k2 = 5)

plot(m$x_new, col = m$y_new)

Run the code above in your browser using DataLab