sscor
computes a robust correlation matrix estimate based on spatial signs, as described in Dürre et al. (2015).
sscor(X, location=c("2dim-median","1dim-median","pdim-median","mean"),
scale=c("mad","Qn","sd"), standardized=TRUE, pdim=FALSE, ...)
"2dim-median"
,"1dim-median"
,"pdim-median"
,"mean"
. The default is "2dim-median"
. See details below."mad"
,"Qn"
,"sd"
. The default is "mad"
. See details below.TRUE
.evSSCM2evShape
if pdim=TRUE
.There are two possibilities to calculate this matrix, one can either estimate all pairwise correlations by the two-dimensional spatial sign correlation or calculate the whole matrix at once by the p-dimensional spatial sign correlation. Both approaches have advantages and disadvantages. The first method should be more robust, especially if only some components of the observations are corrupted. Furthermore the consistency transformation is explicitly known only for the bivariate spatial sign correlation, whereas one has to apply an approximation procedure for the p-dimensional one. Additional argments can be passed to this algorithm using the ...
argument, see the help page of SSCM2Shape
for details. On the other hand, the p-dimensional spatial sign correlation is more efficient under the normal distribution and always yields a positive semidefinite estimation.
The correlation estimator is computed in three steps: the data is standardized marginally, i.e., each variable is divided by a scale estimate. (This step is optional, but recommended, and hence the default.)
Then, if pdim=FALSE
, for each pair of variables the 2x2 spatial sign covariance matrix (SSCM) is computed, and then from the SSCM a univariate correlation estimate given by the formulas (5) and (6) in Dürre et al. (2015). These pairwise correlation estimates are the off-diagonal elements of the returned matrix estimate.
Otherwise, if pdim=TRUE, the pxp SSCM is computed, and then from the SSCM an estimator of the correlation matrix, which is done by the function SSCM2Shape
, see there for details.
Scale estimation:
The scale estimates may either be computed outside the function sscor
and passed on to sscor
as a p-variate numeric vector, or they may be computed by sscor
, using one of the following options:
"mad"
: applies mad
from the standard package stats
. This is the default.
"Qn"
: applies Qn
from the package robustbase
.
"sd"
: applies the standard deviation sd
.
Standardizing the data is recommended (and is hence done by default), particularly so if the marginal scales largly differ. In this case, estimation without prior marginal standardization may become inefficient.
Location estimation:
The SSCM requires a multivariate location estimate. The location may be computed outside the function sscor
and the result passed on to sscor
as a p-variate numeric vector. Alternatively it may be computed by sscor
, using one of the following options:
"2dim-median"
: two-dimensional spatial median, individually for every 2x2 SSCM. This is the default if pdim=FALSE
.
"1dim-median"
: the usual, one-dimensional median applied component-wise.
"pdim-median"
: the p-dimensional spatial median for all variables. This is the default if pdim=TRUE
.
"mean"
: the p-dimensional mean. In light of robustness, it is not recommended to use the mean.
There is no handling of missing values.
Dürre, A., Vogel, D. (2016): Asymptotics of the two-stage spatial sign correlation, Journal of Multivariate Analyis, vol. 144, 54--67. arxiv 1506.02578
cor
.A number of other robust correlation estimators are provided by the package rrcov
.
Testing for spatial sign correlation: sscor.test
.
set.seed(5)
X <- cbind(rnorm(25),rnorm(25))
# X is a 25x2 matrix
# cor() and sscor() behave similar under normality
sscor(X)
cor(X)
# but behave differently in the presence of outliers.
X[1,] <- c(10,10)
sscor(X)
cor(X)
Run the code above in your browser using DataLab