Learn R Programming

Description

NoiseFiltersR contains an extensive implementation of state-of-the-art and classical label noise preprocessing algorithms for classification problems. Such a collection was missing for R statistical software.

Namely, NoiseFiltersR includes 30 label noise filters. All of them are appropriately documented, with a general explanation of the method and the exact reference where it was first published. Moreover, they can be called in a R-user-friendly manner, and their results are unified by means of the filter class, which also benefits from adapted print and summary methods.

Installation

Use install.packages to install NoiseFiltersR and its dependencies from CRAN:

install.packages("NoiseFiltersR")

Once installed, use the command library to attach the package:

library("NoiseFiltersR")

Example of use

Once the package is installed and attached, the user can apply any of the implemented algorithms.

Next instruction shows how to use the well-known Iterative Partitioning Filter (IPF) (Khoshgoftaar & Rebours, 2007) to filter out class noise from the dataset iris. The formula allows us to indicate the classification variable. Default parameters for the algorithm are considered:

out <- IPF(Species~., data = iris)

Then, the variable out is an object of class filter. This is a list with seven elements:

  • cleanData: a data frame containing the filtered dataset.
  • remIdx: a vector of integers indicating the indexes for

removed instances (i.e. their row number with respect to the original data frame).

  • repIdx: a vector of integers indicating the indexes for

repaired/relabelled instances (i.e. their row number with respect to the original data frame).

  • repLab: a factor containing the new labels for repaired instances.
  • parameters: a list containing the tuning parameters used for the filter.
  • call: an expression that contains the original call to the filter.
  • extraInf: a character that includes additional relevant

information not covered by previous items.

To appropriately display the information contained in a filter object, general functions print and summary can be used (more details about their output can be found in the package vignette):

print(out)
summary(out)

Finally, all the implemented algorithms can also be used without a formula argument, just indicating the dataset to be preprocessed and the column that contains the classification variable (last column is assumed by default):

out <- IPF(iris, classColumn = 5)

For more specific information on how to use each filter, please refer to the functions documentation page and the examples contained therein. For a general overview of the NoiseFiltersR package, please look up the associated vignette.

Copy Link

Version

Install

install.packages('NoiseFiltersR')

Monthly Downloads

19

Version

0.1.0

License

GPL-3

Maintainer

Last Published

June 24th, 2016

Functions in NoiseFiltersR (0.1.0)

ENG

Editing with Neighbor Graphs
DROP

Decremental Reduction Optimization Procedures
edgeBoostFilter

Edge Boosting Filter
C45ensembles

Classical Filters based on C4.5
AENN

All-k Edited Nearest Neighbors
EF

Ensemble Filter
dynamicCF

Dynamic Classification Filter
CVCF

Cross-Validated Committees Filter
BBNR

Blame Based Noise Reduction
CNN

Condensed Nearest Neighbors
ModeFilter

Mode Filter
INFFC

Iterative Noise Filter based on the Fusion of Classifiers
hybridRepairFilter

Hybrid Repair-Remove Filter
HARF

High Agreement Random Forest
ORBoostFilter

Outlier Removal Boosting Filter
EWF

Edge Weight Filter
PF

Partitioning Filter
GE

Generalized Edition
ENN

Edited Nearest Neighbors
IPF

Iterative Partitioning Filter
saturationFilter

Saturation Filters
TomekLinks

TomekLinks
PRISM

PReprocessing Instances that Should be Misclassified
summary.filter

Summary method for class filter
RNN

Reduced Nearest Neighbors