ADASYN: Adaptive Synthetic Sampling

Description

Generates synthetic data for minority class to balance imbalanced datasets using ADASYN.

Usage

ADASYN(x, y, k = 5)

Value

a list with resampled dataset.

x_new: Resampled feature matrix.
y_new: Resampled target variable.
x_syn: Generated synthetic data.
C: Number of synthetic samples for each positive class samples.

Arguments

x: feature matrix or data.frame.
y: a factor class variable with two classes.
k: number of neighbors. Default is 5.

Author

Fatih Saglam, saglamf89@gmail.com

Details

Adaptive Synthetic Sampling (ADASYN) is an extension of the Synthetic Minority Over-sampling Technique (SMOTE) algorithm, which is used to generate synthetic examples for the minority class (He et al., 2008). In contrast to SMOTE, ADASYN adaptively generates synthetic examples by focusing on the minority class examples that are harder to learn, meaning those examples that are closer to the decision boundary.

Note: Much faster than smotefamily::ADAS().

References

He, H., Bai, Y., Garcia, E. A., & Li, S. (2008, June). ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In 2008 IEEE international joint conference on neural networks (IEEE world congress on computational intelligence) (pp. 1322-1328). IEEE.

Examples

Run this code


set.seed(1)
x <- rbind(matrix(rnorm(2000, 3, 1), ncol = 2, nrow = 1000),
           matrix(rnorm(100, 5, 1), ncol = 2, nrow = 50))
y <- as.factor(c(rep("negative", 1000), rep("positive", 50)))

plot(x, col = y)

# resampling
m <- ADASYN(x = x, y = y, k = 3)

plot(m$x_new, col = m$y_new)

Run the code above in your browser using DataLab