Oversampling: For a given class (usually the smaller one) all existing observations are taken and copied and extra observations are added by randomly sampling with replacement from this class.
Undersampling: For a given class (usually the larger one) the number of observations is reduced (downsampled) by randomly sampling without replacement from this class.
oversample(task, rate, cl = NULL)undersample(task, rate, cl = NULL)
(Task) The task.
(numeric(1)
)
Factor to upsample or downsample a class.
For undersampling: Must be between 0 and 1,
where 1 means no downsampling, 0.5 implies reduction to 50 percent
and 0 would imply reduction to 0 observations.
For oversampling: Must be between 1 and Inf
,
where 1 means no oversampling and 2 would mean doubling the class size.
(character(1)
)
Which class should be over- or undersampled. If NULL
, oversample
will select the smaller and undersample
the larger class.
Task.
Other imbalancy:
makeOverBaggingWrapper()
,
makeUndersampleWrapper()
,
smote()