"hybridRepairFilter"(formula, data, ...)
"hybridRepairFilter"(x, consensus = FALSE, noiseAction = "remove", classColumn = ncol(x), ...)
TRUE
, consensus voting scheme is applied to identify noisy instances. Otherwise (default),
majority approach is used.filter
, which is a list with seven components:
cleanData
is a data frame containing the filtered dataset.
remIdx
is a vector of integers indicating the indexes for
removed instances (i.e. their row number with respect to the original data frame).
repIdx
is a vector of integers indicating the indexes for
repaired/relabelled instances (i.e. their row number with respect to the original data frame).
repLab
is a factor containing the new labels for repaired instances.
parameters
is a list containing the argument values.
call
contains the original call to the filter.
extraInf
is a character that includes additional interesting
information not covered by previous items.
hybridRepairFilter
builds on the dataset an ensemble of four
classifiers: SVM, Neural Network, CART, KNN (combining k=1,3,5). According to their predictions and
majority or consensus voting schemes, a
subset of instances are labeled as noise. These are removed if noiseAction
equals "remove", their class
is changed into the most voted among the ensemble if noiseAction
equals "repair", and when the latter
is set to "hybrid", the vote of KNN decides whether remove or repair.All this procedure is repeated while the accuracy (over the original dataset) of the ensemble trained with the processed dataset increases.
# Next example is not run in order to save time
## Not run:
# data(iris)
# out <- hybridRepairFilter(iris, noiseAction = "hybrid")
# summary(out, explicit = TRUE)
# ## End(Not run)
Run the code above in your browser using DataLab