sscor: Spatial sign correlation

Description

sscor computes a robust correlation matrix estimate based on spatial signs, as described in Dürre et al. (2015).

Usage

sscor(X, location=c("2dim-median","1dim-median","pdim-median","mean"),
scale=c("mad","Qn","sd"), standardized=TRUE, pdim=FALSE, ...)

Value

p x p symmetric numerical matrix, the diagonal entries are 1, the off-diagonal entries are the pairwise spatial sign correlation estimates.

Arguments

X: (required) p x n data matrix, number of colums is the dimension p and the number of rows is the number of observations n.
location: (optional) either a p-dimensional numeric vector specifying the location or a character string indicating the location estimator to be used. Possible values are "2dim-median","1dim-median","pdim-median","mean". The default is "2dim-median". See details below.
scale: (optional) either a p-dimensional numeric vector specifying the p marginal scales or a character string indicating the scale estimator to be used. Possible values are "mad","Qn","sd". The default is "mad". See details below.
standardized: (optional) logical; indicating whether the data should be standardized by marginal scale estimates prior to computing the spatial sign correlation. The default is TRUE.
pdim: (optional) logical; indicating whether the correlation matrix consists of pairwise correlation estimates or is estimated at once by the p-dimensional spatial sign correlation, see details.
...: (optional) arguments passed to evSSCM2evShape if pdim=TRUE.

Details

The spatial sign correlation is a highly robust estimator of the correlation matrix. It is consistent under elliptical distributions for the generalized correlation matrix (derived from the shape matrix instead of the correlation matrix, i.e., it is also defined when second moments are not finite).

There are two possibilities to calculate this matrix, one can either estimate all pairwise correlations by the two-dimensional spatial sign correlation or calculate the whole matrix at once by the p-dimensional spatial sign correlation. Both approaches have advantages and disadvantages. The first method should be more robust, especially if only some components of the observations are corrupted. Furthermore the consistency transformation is explicitly known only for the bivariate spatial sign correlation, whereas one has to apply an approximation procedure for the p-dimensional one. Additional argments can be passed to this algorithm using the ... argument, see the help page of SSCM2Shape for details. On the other hand, the p-dimensional spatial sign correlation is more efficient under the normal distribution and always yields a positive semidefinite estimation.

The correlation estimator is computed in three steps: the data is standardized marginally, i.e., each variable is divided by a scale estimate. (This step is optional, but recommended, and hence the default.) Then, if pdim=FALSE, for each pair of variables the 2x2 spatial sign covariance matrix (SSCM) is computed, and then from the SSCM a univariate correlation estimate given by the formulas (5) and (6) in Dürre et al. (2015). These pairwise correlation estimates are the off-diagonal elements of the returned matrix estimate. Otherwise, if pdim=TRUE, the pxp SSCM is computed, and then from the SSCM an estimator of the correlation matrix, which is done by the function SSCM2Shape, see there for details.

Scale estimation:

The scale estimates may either be computed outside the function sscor and passed on to sscor as a p-variate numeric vector, or they may be computed by sscor, using one of the following options:

"mad": applies mad from the standard package stats. This is the default.

"Qn": applies Qn from the package robustbase.

"sd": applies the standard deviation sd.

Standardizing the data is recommended (and is hence done by default), particularly so if the marginal scales largly differ. In this case, estimation without prior marginal standardization may become inefficient.

Location estimation:

The SSCM requires a multivariate location estimate. The location may be computed outside the function sscor and the result passed on to sscor as a p-variate numeric vector. Alternatively it may be computed by sscor, using one of the following options:

"2dim-median": two-dimensional spatial median, individually for every 2x2 SSCM. This is the default if pdim=FALSE.

"1dim-median": the usual, one-dimensional median applied component-wise.

"pdim-median": the p-dimensional spatial median for all variables. This is the default if pdim=TRUE.

"mean": the p-dimensional mean. In light of robustness, it is not recommended to use the mean.

There is no handling of missing values.

References

Dürre, A., Vogel, D., Fried, R. (2015): Spatial sign correlation, Journal of Multivariate Analyis, vol. 135, 89--105. arvix 1403.7635

Dürre, A., Vogel, D. (2016): Asymptotics of the two-stage spatial sign correlation, Journal of Multivariate Analyis, vol. 144, 54--67. arxiv 1506.02578

Examples

Run this code


set.seed(5)
X <- cbind(rnorm(25),rnorm(25))
# X is a 25x2 matrix

# cor() and sscor() behave similar under normality
sscor(X)
cor(X)

# but behave differently in the presence of outliers.
X[1,] <- c(10,10)
sscor(X)
cor(X)

Run the code above in your browser using DataLab