ojaRank: Oja Ranks -- Affine Equivariant Multivariate Ranks

Description

The function computes the Oja rank of a point x w.r.t. a data set X or, if no point x is given, the Oja ranks of all points in X.

Usage

ojaRank(X, x = NULL, p = NULL, silent = FALSE, na.action = na.fail)

Arguments

numeric data.frame or matrix containing the data points as rows.

NULL or a numeric vector, the point for which the Oja rank should be computed.

NULL or a number between 0 and 1 which specifies the fraction of hyperplanes to be used for subsampling. If p = 1, no subsampling is done. If p = NULL, the value of p is determined based on the size of the data set. See details.

silent

logical, if subsampling is done or the expected computation time is too long, a warning message will be printed unless silent is TRUE. The default is FALSE.

na.action

a function which indicates what should happen when the data contain 'NA's. Default is to fail.

Value

either a numeric vector, the Oja rank of x, or a matrix of the same dimensions as X containing the Oja ranks of X as rows.

Details

The function computes the Oja rank of the point x w.r.t.\ the data set X or, if no x is specified, the Oja ranks of all data points in X w.r.t. X. For a definition of Oja rank see reference below.

The matrix X needs to have at least as many rows as columns in order to give sensible results. The vector x has to be of length ncol(X). If x is specified, a vector of length ncol(X) is returned. Otherwise the return value is a matrix of the same dimensions as X where the \(i\)-th row contains the Oja rank of the \(i\)-th row of X.

The function will also work for matrices X with only one column and also vectors. Then centered and normalized (univariate) ranks are returned.

For \(n = nrow(X)\) data points in \(R^k\), where \(k = ncol(X)\), the computation of the Oja rank necessitates the evaluation of \(N = choose(n,k)\) hyperplanes in \(R^k\). Thus for large data sets the function offers a subsampling option in order to deliver (approximate) results within reasonable time. The subsampling fraction is controlled by the parameter p: If \(p < 1\) is passed to the function, the computation is based on a random sample of only \(pN\) of all possible \(N\) hyperplanes. If p is not specified (which defaults to p = NULL), it is automatically determined based on \(n\) and \(k\) to yield a sensible trade-off between accuracy and computing time. If \(N k^3 < 6 \cdot 10^6\), the sample fraction p is set to 1 (no subsampling). Otherwise p is choosen such that the computation (of one rank) usually takes around 20 seconds (on a 1.66 GHz CPU and 1 GB RAM). If all Oja ranks of X are requested, a hyperplane sample is drawn once, all Oja ranks are then computed based on this sample.

Finally, subsampling is feasible. Even for very small p useable results can be expected, see e.g. the examples for the function ojaRCM.

Claudia K<U+00F6>llmann is acknowledged for bug-fixing this function.

References

Fischer D, Mosler K, M<U+00F6>tt<U+00F6>nen J, Nordhausen K, Pokotylo O and Vogel D (2020). <U+201C>Computing the Oja Median in R: The Package OjaNP.<U+201D> Journal of Statistical Software, 92(8), pp. 1-36. doi: 10.18637/jss.v092.i08 (URL: http://doi.org/10.18637/jss.v092.i08).

Oja, H. (1999), Affine invariant multivariate sign and rank tests and corresponding estimates: A review, Scand. J. Statist., 26, 319--343.

Examples

Run this code

# NOT RUN {
### ----<< Example 1 >>---- : 30 points in R^2
# }
# NOT RUN {
set.seed(123)
X <- rmvnorm(n = 30,mean = c(0,0)) # from package 'mvtnorm'
ojaRank(X)
ojaRank(X, x = c(100,100))
ojaRank(X, x = ojaMedian(X, alg="exact")) # close to zero

# The following two return the same (only in different time)
ojaRank(X)
t(apply(X, 1, function(y){ojaRank(X,y)}))
# but the following two do not (due to different subsampling).
# 1)
set.seed(123); ojaRank(X, p = 0.9, silent = TRUE)
# 2)
set.seed(123) 
t(apply(X, 1, function(y){ojaRank(X, y, p = 0.9, silent = TRUE)}))
# In 1) one subsample for all ranks is drawn, whereas in 2)
# a different sample for each rank is drawn.
# }
# NOT RUN {
### ----<< Example 2 >>---- : three points in R^3: only one hyperplane
# The following commands return the same result.
# }
# NOT RUN {
ojaRank(X = diag(rep(1, 3)), x = c(0,0,0))
ojaRank(X = diag(rep(1, 3)), x = c(-100,-110,-5550))
hyperplane(X = diag(rep(1,3)))[1:3]
# }
# NOT RUN {

### ----<< Example 3 >>---- : 300 points in R^7
# Subsampling is done.
# The following example might take a bit longer:
# }
# NOT RUN {
set.seed(123)
X <- rmvnorm(n = 300, mean = rep(0, 7))
system.time(or <- ojaRank(x = 1:7, X = X))
# PLEASE NOTE: The computation of the Oja rank is based on a 
# random sub-sample of less than 1% of all possible hyperplanes.
#
#       user      system     elapsed 
#      18.47        0.00       18.47 
print(or,d=4)
# [1]  7.733  6.613  6.839  7.383 18.237 21.851 23.700
# }
# NOT RUN {

### ----<< Example 4 >>---- : univariate ranks
# }
# NOT RUN {
ojaRank(1:10)
ojaRank(X = 1:10, x = 5.5)
# }

Run the code above in your browser using DataLab