Learn R Programming

robustbase (version 0.95-1)

r6pack: Robust Distance based observation orderings based on robust "Six pack"

Description

Compute six initial robust estimators of multivariate location and “scatter” (scale); then, for each, compute the distances \(d_{ij}\) and take the h (\(h > n/2\)) observations with smallest distances. Then compute the statistical distances based on these h observations.

Return the indices of the observations sorted in increasing order.

Usage

r6pack(x, h, full.h, scaled = TRUE, scalefn = rrcov.control()$scalefn)

Value

a \(h' \times 6\)

matrix of observation indices, i.e., with values from \(1,\dots,n\). If

full.h is true, \(h' = n\), otherwise \(h' = h\).

Arguments

x

n x p data matrix

h

integer, typically around (and slightly larger than) \(n/2\).

full.h

logical specifying if the full (length n) observation ordering should be returned; otherwise only the first h are. For .detmcd(), full.h=FALSE is typical.

scaled

logical indicating if the data x is already scaled; if false, we apply x <- doScale(x, median, scalefn).

scalefn

a function(u) to compute a robust univariate scale of u.

Author

Valentin Todorov, based on the original Matlab code by Tim Verdonck and Mia Hubert. Martin Maechler for tweaks (performance etc), and full.h.

Details

The six initial estimators are

  1. Hyperbolic tangent of standardized data

  2. Spearmann correlation matrix

  3. Tukey normal scores

  4. Spatial sign covariance matrix

  5. BACON

  6. Raw OGK estimate for scatter

References

Hubert, M., Rousseeuw, P. J. and Verdonck, T. (2012) A deterministic algorithm for robust location and scatter. Journal of Computational and Graphical Statistics 21, 618--637.

See Also

covMcd(*, nsamp = "deterministic"); CovSest(*, nsamp = "sdet") from package rrcov.

Examples

Run this code
data(pulpfiber)
dim(m.pulp <- data.matrix(pulpfiber)) #  62 x 8
dim(fr6  <- r6pack(m.pulp, h = 40, full.h= FALSE)) #  h x 6  = 40 x 6
dim(fr6F <- r6pack(m.pulp, h = 40, full.h= TRUE )) #  n x 6  = 62 x 6
stopifnot(identical(fr6, fr6F[1:40,]))
# \dontshow{
stopifnot(apply(fr6[1:10,], 2L,
   function(col) c(1,3,6,35,36,38) %in% col))
# }

Run the code above in your browser using DataLab