P3C: The P3C Algorithm for Projected Clustering

Description

The main idea of the P3C algorithm is to use statistical distributions for the task of finding clusters. To this end each dimension is first split into 1+log_2(nrow(data)) bins and the chi-square test is used to compute the probability that the sizes of these bins are uniformly distributed. If this probability is bigger than 1-ChiSquareAlpha, nothing happens. Otherwise the largest bins will be removed until this is the case. The bins that were removed in this way are then used to find clusters. To this end, bins that are adjacent are merged. Then clusters are formed by taking a bin from one dimension and determining the probability of sharing as many points as it does with each bin from another dimension. The bin is then intersected with the bin from another dimension where this probability is the lowest, given that it is at least lower than 1.00E-PoissonThreshold and this is repeated until no such bin is found.

Usage

P3C(data, ChiSquareAlpha = 0.005, PoissonThreshold = 19)

Arguments

data

A Matrix of input data.

ChiSquareAlpha

probability of not being uniformly distributed that the points in a dimension are allowed to have without assuming that there is a cluster visible from this dimension

PoissonThreshold

maximum probability for a bin in another dimension to deviate from the observed bin as much as it does that is allowed. The value used for this will be 1.00*10^-PoissonThreshold.

References

Gabriela Moise, Jörg Sander and Martin Ester P3C: A Robust Projected Clustering Algorithm In Proc. 6th IEEE International Conference on Data Mining 2006

Examples

Run this code

data("subspace_dataset")
P3C(subspace_dataset,PoissonThreshold=3)

Run the code above in your browser using DataLab

Description

Usage

Arguments

References

See Also

Examples