Takes a dataset, and finds its outliers based on principal components using combination of different method
PCOutlierDetection(x, k = 0.05 * nrow(x), cutoff = 0.95,
Method = "euclidean", rnames = FALSE, depth = FALSE,
dense = FALSE, distance = FALSE, dispersion = FALSE,
infocut = 0.9)
dataset for which outliers are to be found
No. of nearest neighbours to be used for for outlier detection using bootstrapping, default value is 0.05*nrow(x)
Percentile threshold used for distance, default value is 0.95
Distance method, default is Euclidean
Logical value indicating whether the dataset has rownames, default value is False
Logical value indicating whether depth based method should be used or not, default is False
Logical value indicating whether density based method should be used or not, default is False
Logical value indicating whether distance based methods should be used or not, default is False
Logical value indicating whether dispersion based methods should be used or not, default is False
Amount of variation for deciding the no. of principal components to be retained in the analysis, default is 0.9
Outlier Observations: A matrix of outlier observations
Location of Outlier: Vector of Sr. no. of outliers
OutlierDetection finds outlier observations for the principal component space using different methods and based on all the methods considered, labels an observation as outlier(intersection of all the methods). For bivariate data, it also shows the scatterplot of the data with labelled outliers.
# NOT RUN {
PCOutlierDetection(iris[,-5])
# }
Run the code above in your browser using DataLab