"ModeFilter"(formula, data, ...)
"ModeFilter"(x, type = "classical", noiseAction = "repair", epsilon = 0.05, maxIter = 100, alpha = 1, beta = 1, classColumn = ncol(x), ...)
filter
, which is a list with seven components:
cleanData
is a data frame containing the filtered dataset.
remIdx
is a vector of integers indicating the indexes for
removed instances (i.e. their row number with respect to the original data frame).
repIdx
is a vector of integers indicating the indexes for
repaired/relabelled instances (i.e. their row number with respect to the original data frame).
repLab
is a factor containing the new labels for repaired instances.
parameters
is a list containing the argument values.
call
contains the original call to the filter.
extraInf
is a character that includes additional interesting
information not covered by previous items.
ModeFilter
estimates the most appropriate class for each instance based on the similarity metric
and the provided label. This can be addressed in three different ways (argument 'type'):In the classical approach, all labels are tried for all instances, and the one maximizing a metric based on similarity is chosen. In the iterative approach, the same scheme is repeated until the proportion of modified instances is less than epsilon or the maximum number of iterations maxIter is reached. The weighted approach extends the classical one by assigning a weight for each instance, which quantifies the reliability on its label. This weights is utilized in the computation of the metric to be maximized.
# Next example is not run because in some cases it can be rather slow
## Not run:
# data(iris)
# out <- ModeFilter(Species~., data = iris, type = "classical", noiseAction = "remove")
# print(out)
# identical(out$cleanData, iris[setdiff(1:nrow(iris),out$remIdx),])
# ## End(Not run)
Run the code above in your browser using DataLab