"EWF"(formula, data, ...)
"EWF"(x, threshold = 0.25, noiseAction = "remove", classColumn = ncol(x), ...)
filter
, which is a list with seven components:
cleanData
is a data frame containing the filtered dataset.
remIdx
is a vector of integers indicating the indexes for
removed instances (i.e. their row number with respect to the original data frame).
repIdx
is a vector of integers indicating the indexes for
repaired/relabelled instances (i.e. their row number with respect to the original data frame).
repLab
is a factor containing the new labels for repaired instances.
parameters
is a list containing the argument values.
call
contains the original call to the filter.
extraInf
is a character that includes additional interesting
information not covered by previous items.
EWF
builds up a Relative Neighborhood Graph (RNG) from the dataset. Then, it identifies
as 'suspicious' those instances with a significant value of itslocal cut edge weight statistic, which
intuitively means that they are surrounded by examples from a different class.Namely, the aforementioned statistic is the sum of the weights of edges joining
the instance (in the RNG graph) with instances from a different class.
Under the null hypothesis of the class label being independent of
the event 'being neighbors in the RNG graph', the distribution of this statistic can be approximated by a
gaussian one. Then, the p-value for the observed value is computed and contrasted with the
provided threshold
.
To handle 'suspicious' instances there are two approaches ('remove' or 'hybrid'), and the argument 'noiseAction' determines which one to use. With 'remove', every suspect is removed from the dataset. With the 'hybrid' approach, an instance is removed if it does not have good (i.e. non-suspicious) RNG-neighbors. Otherwise, it is relabelled with the majority class among its good RNG-neighbors.
# Next example is not run because EWF is time-consuming
## Not run:
# data(iris)
# trainData <- iris[c(1:20,51:70,101:120),]
# out <- EWF(Species~Petal.Length+Sepal.Length, data = trainData, noiseAction = "hybrid")
# print(out)
# ## End(Not run)
Run the code above in your browser using DataLab