Learn R Programming

OjaNP (version 1.0-0)

ojaRCM: Oja Rank Convariance Matrix

Description

The function computes the Oja rank covariance matrix of a data set X.

Usage

ojaRCM(X, p = NULL, silent = FALSE, na.action = na.fail)

Arguments

X

numeric data.frame or matrix containing the data points as rows.

p

NULL or a number between 0 and 1 which specifies the fraction of hyperplanes to be used for subsampling. If p = 1, no subsampling is done. If p = NULL, the value of p is determined based on the size of the data set. See function ojaRank for details.

silent

logical, if subsampling is done or the expected computation time is too long, a warning message will be printed unless silent is TRUE. The default is FALSE.

na.action

a function which indicates what should happen when the data contain 'NA's. Default is to fail.

Value

a symmetric matrix with ncol(X) columns and rows.

Details

The function computes the Oja rank covariance matrix of the data set X, that is (since Oja ranks are centered) the covariance matrix of the Oja ranks of the data points in X, taken w.r.t. the data set X.

For a definition of the Oja rank covariance matrix and its properties see references below. The matrix X needs to have at least as many rows as columns in order to give sensible results. The return value is a quadratic, symmetric matrix having as many columns as X. It works also for matrices X with only one column and also vectors, but note that the variance of univariate ranks does not yield much information about the data.

The function offers a subsampling option in order to speed up computation for large data sets. The subsampling fraction is controlled by the parameter p. If p is not specified (which defaults to p = NULL), it is automatically determined based on the dimension of the problem. The function tries to realize a reasonable compromise between accuracy and computing time, that is, for sufficiently small data matrices X the sampling fraction p is set to 1. Subsampling is applied to hyperplanes, not data points. A sample is drawn once, all Oja ranks are then computed based on this sample. For further details on subsampling see function ojaRank. Subsampling is useful. Even for very small p useable results can be expected, see e.g. Example 2.

References

Fischer D, Mosler K, M<U+00F6>tt<U+00F6>nen J, Nordhausen K, Pokotylo O and Vogel D (2020). <U+201C>Computing the Oja Median in R: The Package OjaNP.<U+201D> Journal of Statistical Software, 92(8), pp. 1-36. doi: 10.18637/jss.v092.i08 (URL: http://doi.org/10.18637/jss.v092.i08).

Visuri, S., Koivunen, V., Oja, H. (1999), Sign and rank covariance matrices, J. Stat. Plann. Inference, 91, 557--575.

Ollila, E., Croux, C., Oja, H. (2004), Influence function and asymptotic effiency of the affine equivariant rank covariance matrix, Statistica Sinica, 14, 297--316.

See Also

ojaRank, ojaSCM

Examples

Run this code
# NOT RUN {
### ----<< Example 1 >>---- : biochem data
data(biochem)
X <- biochem[,1:2]
ojaRCM(X)

# Oja ranks are centered 
# (i.e. they add up to zero), and 
# the following two return the same.
ojaRCM(X)
(1 - 1/nrow(X))*cov(ojaRank(X))



### ----<< Example 2 >>---- : 300 points in R^7 
# The merit of subsampling.
# The following example might take a bit longer:
# }
# NOT RUN {
A <- matrix(c(1,0.5,1,4,2,0.5,-0.5,1,4), ncol = 3)
B <- A %x% A;  Sigma  <- (B %*% t(B))[1:7, 1:7]
# Sigma is some arbitrary positive definite matrix.
set.seed(123)
X <- rmvnorm(n = 300, sigma = Sigma) 

cov2cor(Sigma) # the true correlation matrix
cor(X)  # Bravais-Pearson correlation
cov2cor(solve(ojaRCM(X))
# correlation estimate based on Oja ranks 
# The subsampling fraction in this example
# is p = 1.081438e-10.
# Yet it returns a sensible estimate.
# }

Run the code above in your browser using DataLab