Classification task. These measures capture the differences in the number of examples per class in the dataset. When these differences are severe, problems related to generalization of the ML classification techniques could happen because of the imbalance ratio.
balance(...)# S3 method for default
balance(x, y, measures = "all", ...)
# S3 method for formula
balance(formula, data, measures = "all", ...)
Not used.
A data.frame contained only the input attributes.
A factor response vector with one label for each row/component of x.
A list of measures names or "all"
to include all them.
A formula to define the class column.
A data.frame dataset contained the input attributes and class.
A list named by the requested class balance measure.
The following measures are allowed for this method:
The entropy of class proportions (C1) capture the imbalance in a dataset based on the proportions of examples per class.
The imbalance ratio (C2) is an index computed for measuring class balance. This is a version of the measure that is also suited for multiclass classification problems.
Ana C Lorena, Ivan G Costa, Newton Spolaor and Marcilio C P Souto. (2012). Analysis of complexity indices for classification problems: Cancer gene expression data. Neurocomputing 75, 1, 33--42.
Ajay K Tanwani and Muddassar Farooq. (2010). Classification potential vs. classification accuracy: a comprehensive study of evolutionary algorithms with biomedical datasets. Learning Classifier Systems 6471, 127--144.
Other complexity-measures: correlation
,
dimensionality
, linearity
,
neighborhood
, network
,
overlapping
, smoothness
# NOT RUN {
## Extract all balance measures for classification task
data(iris)
balance(Species ~ ., iris)
# }
Run the code above in your browser using DataLab