Learn R Programming

UBL (version 0.0.9)

ImbR: Synthetic Regression Data Set

Description

Simulated data set for imbalanced domain on regression. The rare cases corresponden to the higher extreme values and are described by a circle with white noise. The normal cases have a normal distribution with the same center of the circunference with elliptical contours.

Usage

data(ImbR)

Arguments

Format

The data set has 2 continuous features (X1 and X2) and a continuous target variable (denoted as Tgt). The rare examples, i.e, cases with higher values of the target variable occur in 5% of the data. Data set ImbR has 1000 examples.

ImbR data has been simulated as follows:

-

lower Tgt values: (X1, X2)\(\sim \mathbf{N}_{2} \left(\mathbf{10}_{2}, \mathbf{2.5}_{2}\right)\)

and Tgt\sim \mathbf{\Gamma} \left( 0.5, 1 \right) +10
-

higher Tgt values: (X1, X2)\(\sim \left(\rho * cos(\theta) + 10, \rho * sin(\theta) + 10 \right)\), where \(\rho \sim \mathbf{9}_{2}+\mathbf{N}_{2} \left(\mathbf{0}_{2}, \mathbf{I}_{2} \right)\) and \(\theta \sim \mathbf{U}_{2} \left( \mathbf{0}_{2}, 2\pi \mathbf{I}_{2} \right)\) Tgt\(\sim \mathbf{\Gamma} \left( 1,1 \right) + 20\)

Author

Paula Branco paobranco@gmail.com, Rita Ribeiro rpribeiro@dcc.fc.up.pt and Luis Torgo ltorgo@dcc.fc.up.pt

Examples

Run this code
data(ImbR)
summary(ImbR)

boxplot(ImbR$Tgt)

Run the code above in your browser using DataLab