Learn R Programming

snipEM (version 1.0.1)

sclust: Snipping for robust model based clustering analysis with cellwise outliers

Description

Estimates a finite Gaussian mixture model optimized over a snipping set.

Usage

sclust(X, k, V, R, restr.fact=12, tol = 1e-04, maxiters = 100, 
          maxiters.S = 1000, print.it = FALSE)

Arguments

X

Data.

k

Number of clusters

V

Binary matrix of the same size as X. Zeros correspond to initial snipped entries.

R

Initial guess for cluster labels, 1 to k.

restr.fact

Restriction factor, i.e., constraint on the condition number of all covariance matrices for each cluster. Default is 12.

tol

Tolerance for convergence. Default is 1e-4.

maxiters

Maximum number of iterations for the SM algorithm. Default is 100.

maxiters.S

Maximum number of iterations of the inner greedy snipping algorithm. Default is 1000.

print.it

Logical; if TRUE, partial results are print. Default is FALSE.

Value

A list with the following elements:

R Final cluster labels.
mu Estimated location matrix.
S Array of estimated scatter matrices.
V Final (optimal) V matrix.
lik Gaussian log-likelihood at convergence.
iter Number of outer iterations before convergence.

Details

This function computes the sclust estimator of Farcomeni (2014). It leads to robust mixture modeling in presence of entry-wise outliers. It is based on a classification-expectation-snip-maximize (CESM) algorithm. At the S step, the likelihood is optimized over the set of snipped entries, at the M step the location and scatter estimates are updated. The S step is based on a greedy algorithm, unlike the one proposed in Farcomeni (2014,2014a). The number of snipped entries sum(1-V) is kept fixed throughout. Note that initializing with labels arising from classical (non-robust) clustering methods may be detrimental for the final performance of sclust and may even yield an error due to empty clusters.

References

Farcomeni, A. (2014) Snipping for robust k-means clustering under component-wise contamination, Statistics and Computing, 24, 909-917

Farcomeni, A. (2014) Robust constrained clustering in presence of entry-wise outliers, Technometrics, 56, 102-111

See Also

snipEM, stEM, sumlog, ldmvnorm

Examples

Run this code
# NOT RUN {
set.seed(1234)
X <- matrix(NA,200,5)
# two clusters
k <- 2
X[1:100,] <- rnorm(100*5)
X[101:200,] <- rnorm(100*5,15)
R <- rep(c(1,2), each=100)

# 5% cellwise outliers
s <- sample(200*5,200*5*0.05)
X[s] <- runif(200*5*0.05,-100,100)
V <- X
V[s] <- 0
V[-s] <- 1

# Initial V and R
Vinit <- matrix(1, nrow(X), ncol(X))
Vinit[which(X > quantile(X,0.975) | X < quantile(X,0.025))] <- 0
Rinit <- kmeans(X,2)$clust

# Snipped robust clustering
sc <- sclust(X,2,Vinit,Rinit)
table(R,Rinit)
table(R,sc$R)
# }

Run the code above in your browser using DataLab