Learn R Programming

snipEM (version 1.0.1)

snipEM: Snipping for location and scatter estimation with cellwise outliers

Description

Computes an estimator optimizing the Gaussian likelihood over a snipping set. The function snipEM.initialV can be used to perform some iterations to initialize V.

Usage

snipEM(X, V, tol = 1e-04, maxiters = 500, maxiters.S = 1000, print.it = FALSE) 

snipEM.initialV(X, V, mu0, S0, maxiters.S = 100, greedy = TRUE)

Arguments

X

Data.

V

Binary matrix of the same size as X. Zeros correspond to initial snipped entries.

tol

Tolerance for convergence. Default is 1e-4.

maxiters

Maximum number of iterations for the SM algorithm. Default is 500.

maxiters.S

Maximum number of iterations of the inner greedy snipping algorithm. Default is 1000.

print.it

Logical; if TRUE, partial results are print. Default is FALSE.

mu0

Initial estimate for the mean vector that is used in the initialization stage.

S0

Initial estimate for the covariance matrix that is used in the initialization stage.

greedy

Logical; if TRUE, perform the greedy snipping algorithm in search for the binary matrix that gives the largest likelihood value throughout maxiters.S iterations. If FALSE, stop right after the snipping algorithm finds a binary matrix that gives a larger likelihood value than the initial one. Default is TRUE.

Value

A list with the following elements:

mu Estimated location.
S Estimated scatter matrix.
V Final (optimal) V matrix.
lik Gaussian log-likelihood at convergence.
iter Number of outer iterations before convergence.

Details

This function computes the sclust estimator of Farcomeni (2014) with \(k=1\). It therefore provides a robust estimate of location and scatter in presence of entry-wise outliers. It is based on a snip-maximize (SM) algorithm. At the S step, the likelihood is optimized over the set of snipped entries, at the M step the location and scatter estimates are updated. The S step is based on a greedy algorithm, unlike the one proposed in Farcomeni (2014,2014a). The number of snipped entries sum(1-V) is kept fixed throughout.

Results depend on good initialization of the V matrix. A boxplot rule (see examples) usually works well. The function snipEM.initialV can be used to improve the initial choice through some iterations updating only V from initial (robust) estimates mu0 and S0. In the example, the EMVE is used to obtain mu0 and S0.

References

Farcomeni, A. (2014) Snipping for robust k-means clustering under component-wise contamination, Statistics and Computing, 24, 909-917

Farcomeni, A. (2014) Robust constrained clustering in presence of entry-wise outliers, Technometrics, 56, 102-111

See Also

sclust, stEM, sumlog, ldmvnorm

Examples

Run this code
# NOT RUN {
n=100
p=5
Xc <- matrix(rnorm(100*10),100,5)

# initial V 
V <- matrix(1,n,p)
V[!is.na(match(as.vector(Xc),boxplot(as.vector(Xc),plot=FALSE)$out))] <- 0
Xna <- Xc 
Xna[ which( V == 0) ] <- NA

resSEM <- snipEM(Xc, V)

# }

Run the code above in your browser using DataLab