Learn R Programming

snipEM (version 1.0.1)

skmeans: Snipped k-means clustering with cellwise outliers

Description

Perform k-means clustering on a data matrix with cellwise outliers using a snipping algorithm.

Usage

skmeans(X, k, V, clust, s, itersmax = 10^5, D = 1e-1)

Arguments

X

Data.

k

Integer; number of clusters, k>1.

V

Binary matrix of the same size as X. Zeros correspond to initial snipped entries.

clust

Vector of size n containing values from 1 to k. Starting solution for class labels.

itersmax

Max number of iterations of the algorithm. Default is 3*10^5.

s

Binary vector of size n for trimming, starting solution. Number of zeros will be preserved and correspond to trimmed rows. If the vector is rep(1,n), it performs no trimming. Default is rep(1,n).

D

Tuning parameter for the fitting algorithm. Corresponds approximately to the maximal change in loss by switching two non outlying entries. Comparing different choices is recommended. Default is 1e-1.

Value

A list with the following elements:

loss Loss function (the total sum of squares) at convergence.
mu Estimated locations.
s Final (optimal) trimmed rows in vector of size n.
V Final (optimal) V matrix.
clust Final (optimal) class labels as vector of size n.

Details

This function computes the skmeans estimator of Farcomeni (2014). It leads to robust k-means in presence of entry-wise and cellwise outliers. The number of snipped entries sum(1-V) and trimmed rows sum(1-s) is kept fixed throughout. Initial estimates for V, s and clust should be provided. Note that initializing with labels arising from classical (non-robust) clustering methods may be detrimental for the final performance of skmeans and may even yield an error due to empty clusters.

References

Farcomeni, A. (2014) Snipping for robust k-means clustering under component-wise contamination, Statistics and Computing, 24, 909-917

See Also

sclust, stEM, snipEM,

Examples

Run this code
# NOT RUN {
set.seed(1234)
X <- matrix(NA,200,5)
# two clusters
k <- 2
X[1:100,] <- rnorm(100*5)
X[101:200,] <- rnorm(100*5,15)
clust <- rep(c(1,2), each=100)

# 5% cellwise outliers
s <- sample(200*5,200*5*0.05)
X[s] <- runif(200*5*0.05,-100,100)
V <- X
V[s] <- 0
V[-s] <- 1

# Initial V and R
Vinit <- matrix(1, nrow(X), ncol(X))
Vinit[which(X > quantile(X,0.975) | X < quantile(X,0.025))] <- 0
km <- kmeans(X,k)
clustinit <- km$clust

# Snipped robust clustering
skm <- skmeans(X, k, Vinit, clustinit)

table(clust,km$clust)
table(clust,skm$clust)
# }

Run the code above in your browser using DataLab