Detects if data points are noise or part of a cluster, based on a Poisson process model.
NNclean(data, k, distances = NULL, edge.correct = FALSE, wrap = 0.1,
convergence = 0.001, plot=FALSE, quiet=TRUE)# S3 method for nnclean
print(x, ...)
NNclean
returns a list of class nnclean
with components
0-1-vector of length of the number of data points. 1 means cluster, 0 means noise.
vector of estimated a priori probabilities for each point to belong to the cluster component.
see above.
intensity parameter of cluster component.
intensity parameter of noise component.
estimated probability of cluster component.
distance to kth nearest neighbor.
numerical matrix or data frame.
integer. Number of considered nearest neighbors per point.
distance matrix object of class dist
. If
specified, it is used instead of computing distances from the data.
logical. If TRUE
and the data is
two-dimensional, neighbors for points at the edges of the parent
region of the noise Poisson process are determined after wrapping
the region onto a toroid.
numerical. If edge.correct=TRUE
, points in a
strip of size wrap*range
along the edge for each variable
are candidates for
being neighbors of points from the opposite.
numerical. Convergence criterion for EM-algorithm.
logical. If TRUE
, a histogram of the distance to
kth nearest neighbor and fit is plotted.
logical. If FALSE
, the likelihood is printed
during the iterations.
object of class nnclean
.
necessary for print methods.
R-port by Christian Hennig
christian.hennig@unibo.it
https://www.unibo.it/sitoweb/christian.hennig/en,
original Splus package by S. Byers and A. E. Raftery.
The assumption is that the noise is distributed as a homogeneous Poisson process on a certain region and the clusters are distributed as a homogeneous Poisson process with larger intensity on a subregion (disconnected in case of more than one cluster). The distances are then distributed according to a mixture of two transformed Gamma distributions, and this mixture is estimated via the EM-algorithm. The points are assigned to noise or cluster component by use of the estimated a posteriori probabilities.
Byers, S. and Raftery, A. E. (1998) Nearest-Neighbor Clutter Removal for Estimating Features in Spatial Point Processes, Journal of the American Statistical Association, 93, 577-584.
library(mclust)
data(chevron)
nnc <- NNclean(chevron[,2:3],15,plot=TRUE)
plot(chevron[,2:3],col=1+nnc$z)
Run the code above in your browser using DataLab