dens: Outlier detection using Robust Kernal-based Outlier Factor(RKOF) algorithm

Description

Takes a dataset and finds its outliers using Robust Kernal-based Outlier Factor(RKOF) algorithm

Usage

dens(x, k = 0.05 * nrow(x), C = 1, alpha = 1, sigma2 = 1,
  cutoff = 0.95, rnames = F, boottimes = 100)

Arguments

dataset for which outliers are to be found

No. of nearest neighbours to be used, default value is 0.05*nrow(x)

Multiplication parameter for k-distance of neighboring observations. Act as bandwidth increaser. Default is 1 such that k-distance is used for the gaussian kernel

alpha

Sensivity parameter for k-distance/bandwidth. Small alpha creates small variance in RKOF and vice versa. Default is 1

sigma2

Variance parameter for weighting of neighboring observations

cutoff

Percentile threshold used for distance, default value is 0.95

rnames

Logical value indicating whether the dataset has rownames, default value is False

boottimes

Number of bootsrap samples to find the cutoff, default is 100 samples

Value

Outlier Observations: A matrix of outlier observations

Location of Outlier: Vector of Sr. no. of outliers

Outlier probability: Vector of proportion of times an outlier exceeds local bootstrap cutoff

Details

dens computes outlier score of an observation using DDoutlier package(based on RKOF algorithm) and based on the bootstrapped cutoff, labels an observation as outlier. Outlierliness of the labelled 'Outlier' is also reported and it is the bootstrap estimate of probability of the observation being an outlier. For bivariate data, it also shows the scatterplot of the data with labelled outliers.

References

Ester, M., Kriegel, H.-P., Sander, J., and Xu, X. 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proc. Int. Conf. on Knowledge Discovery and Data Mining (KDD), Portland, OR.

Examples

Run this code

# NOT RUN {
#Create dataset
X=iris[,1:4]
#Outlier detection
dens(X,k=4,C=1)
# }

Run the code above in your browser using DataLab