Learn R Programming

rrcov (version 1.7-6)

CovSde: Stahel-Donoho Estimates of Multivariate Location and Scatter

Description

Compute a robust estimate of location and scale using the Stahel-Donoho projection based estimator

Usage

CovSde(x, nsamp, maxres, tune = 0.95, eps = 0.5, prob = 0.99, 
seed = NULL, trace = FALSE, control)

Value

An S4 object of class CovSde-class which is a subclass of the virtual class CovRobust-class.

Arguments

x

a matrix or data frame.

nsamp

a positive integer giving the number of resamples required; nsamp may not be reached if too many of the p-subsamples, chosen out of the observed vectors, are in a hyperplane. If nsamp = 0 all possible subsamples are taken. If nsamp is omitted, it is calculated to provide a breakdown point of eps with probability prob.

maxres

a positive integer specifying the maximum number of resamples to be performed including those that are discarded due to linearly dependent subsamples. If maxres is omitted it will be set to 2 times nsamp.

tune

a numeric value between 0 and 1 giving the fraction of the data to receive non-zero weight. Defaults to 0.95

prob

a numeric value between 0 and 1 specifying the probability of high breakdown point; used to compute nsamp when nsamp is omitted. Defaults to 0.99.

eps

a numeric value between 0 and 0.5 specifying the breakdown point; used to compute nsamp when nresamp is omitted. Defaults to 0.5.

seed

starting value for random generator. Default is seed = NULL.

trace

whether to print intermediate results. Default is trace = FALSE.

control

a control object (S4) of class CovControlSde-class containing estimation options - same as these provided in the fucntion specification. If the control object is supplied, the parameters from it will be used. If parameters are passed also in the invocation statement, they will override the corresponding elements of the control object.

Author

Valentin Todorov valentin.todorov@chello.at and Kjell Konis kjell.konis@epfl.ch

Details

The projection based Stahel-Donoho estimator posses very good statistical properties, but it can be very slow if the number of variables is too large. It is recommended to use this estimator if n <= 1000 and p<=10 or n <= 5000 and p<=5. The number of subsamples required is calculated to provide a breakdown point of eps with probability prob and can reach values larger than the larger integer value - in such case it is limited to .Machine$integer.max. Of course you could provide nsamp in the call, i.e. nsamp=1000 but this will not guarantee the required breakdown point of th eestimator. For larger data sets it is better to use CovMcd or CovOgk. If you use CovRobust, the estimator will be selected automatically according on the size of the data set.

References

R. A. Maronna and V.J. Yohai (1995) The Behavior of the Stahel-Donoho Robust Multivariate Estimator. Journal of the American Statistical Association 90 (429), 330--341.

R. A. Maronna, D. Martin and V. Yohai (2006). Robust Statistics: Theory and Methods. Wiley, New York.

Todorov V & Filzmoser P (2009), An Object Oriented Framework for Robust Multivariate Analysis. Journal of Statistical Software, 32(3), 1--47. tools:::Rd_expr_doi("10.18637/jss.v032.i03").

Examples

Run this code
data(hbk)
hbk.x <- data.matrix(hbk[, 1:3])
CovSde(hbk.x)

## the following four statements are equivalent
c0 <- CovSde(hbk.x)
c1 <- CovSde(hbk.x, nsamp=2000)
c2 <- CovSde(hbk.x, control = CovControlSde(nsamp=2000))
c3 <- CovSde(hbk.x, control = new("CovControlSde", nsamp=2000))

## direct specification overrides control one:
c4 <- CovSde(hbk.x, nsamp=100,
             control = CovControlSde(nsamp=2000))
c1
summary(c1)
plot(c1)

## Use the function CovRobust() - if no estimation method is
##  specified, for small data sets CovSde() will be called
cr <- CovRobust(hbk.x)
cr

Run the code above in your browser using DataLab