Learn R Programming

OjaNP (version 1.0-0)

ojaSCM: Oja Sign Convariance Matrix

Description

The function computes the Oja sign covariance matrix of a data set X.

Usage

ojaSCM(X, center = "ojaMedian", p = NULL, silent = FALSE, 
       na.action = na.fail, ...)

Arguments

X

numeric data.frame or matrix containing the data points as rows.

center

one of the following three:

  • a numeric vector giving the location of the data,

  • a function that computes a multivariate location (see details below) or

  • one of the following strings:

    • "colMean" (vector of means, function colMeans is called),

    • "ojaMedian" (function ojaMedian),

    • "spatialMedian" (function spatial.median from package ICSNP),

    • "compMedian" (marginal median) or

    • "HRMedian" (Hettmansperger and Randles median, function HR.Mest from package ICSNP).

The default is "ojaMedian".

p

NULL or a number between 0 and 1 which specifies the fraction of hyperplanes to be used for subsampling. If p = 1, no subsampling is done. If p = NULL, the value of p is determined based on the size of the data set. See function ojaSign for details.

silent

logical, if subsampling is done or the expected computation time is too long, a warning message will be printed unless silent is TRUE. The default is FALSE.

na.action

a function which indicates what should happen when the data contain 'NA's. Default is to fail.

arguments passed on to the location function.

Value

a symmetric matrix with ncol(X) columns and rows.

Details

The function computes the Oja sign covariance matrix of the data set X, that is (if the Oja signs are centered by the Oja median) the covariance matrix of the Oja signs of the data points in X, taken w.r.t. X.

For a definition of the Oja sign covariance matrix and its properties see references below. The matrix X needs to have at least two columns and at least as many rows as columns in order to give sensible results. The return value is a quadratic, symmetric matrix having as many columns as X.

Oja signs (contrary to Oja ranks) require the computation of a centre (location) of the data cloud. The function offers various ways to specify the location. For details on location computation see function ojaSign.

The function offers a subsampling option in order to speed up computation for large data sets. The subsampling fraction is controlled by the parameter p. If p is not specified (which defaults to p = NULL), it is automatically determined based on the dimension of the problem. The function tries to realize a reasonable compromise between accuracy and computing time, that is, for sufficiently small data matrices X the sampling fraction p is set to 1. Subsampling is applied to hyperplanes, not data points. A sample is drawn once, all Oja signs are then computed based on this sample. For further details on subsampling see function ojaSign. Subsampling is useful. Even for very small p useable results can be expected, see e.g. Example 2.

References

Fischer D, Mosler K, M<U+00F6>tt<U+00F6>nen J, Nordhausen K, Pokotylo O and Vogel D (2020). <U+201C>Computing the Oja Median in R: The Package OjaNP.<U+201D> Journal of Statistical Software, 92(8), pp. 1-36. doi: 10.18637/jss.v092.i08 (URL: http://doi.org/10.18637/jss.v092.i08).

Visuri, S., Koivunen, V., Oja, H. (1999), Sign and rank covariance matrices, J. Stat. Plann. Inference, 91, 557--575.

Ollila, E., Oja, H., Croux, C. (2003), The affine equivariant sign covariance matrix: Asymptotic behavior and efficiencies, J. Multivariate Analysis, 87, 328--355.

See Also

ojaSign, ojaRCM, ojaMedian, spatial.median, HR.Mest

Examples

Run this code
# NOT RUN {
### ----<< Example 1 >>---- : biochem data
data(biochem)
X <- biochem[,1:2]
ojaSCM(X)

# Oja signs are correctly centered 
# (i.e. they add up to zero) when 
# computed w.r.t. the Oja median
# Hence the following return the same,
ojaSCM(X, center = "ojaMedian", alg = "exact")
(1 - 1/nrow(X))*cov(ojaSign(X, alg = "exact"))
# but the following not.
ojaSCM(X, center = "colMean")
(1 - 1/nrow(X))*cov(ojaSign(X, center = "colMean"))



### ----<< Example 2 >>---- : 300 points in R^7 
# The merit of subsampling.
# The following example might take a bit longer:
# }
# NOT RUN {
A <- matrix(c(1,0.5,1,4,2,0.5,-0.5,1,4), ncol = 3)
B <- A %x% A;  Sigma  <- (B %*% t(B))[1:7, 1:7]
# Sigma is some arbitrary positive definite matrix.
set.seed(123)
X <- rmvnorm(n=300,sigma=Sigma) 

cov2cor(Sigma) # the true correlation matrix
cor(X)  # Bravais-Pearson correlation
cov2cor(solve(ojaSCM(X, center = "colMean")))
# correlation estimate based on Oja signs 
# The subsampling fraction in this example
# is p = 4.542038e-09.
# Yet it returns a sensible estimate.
# }

Run the code above in your browser using DataLab