Learn R Programming

OjaNP (version 1.0-0)

ojaSign: Oja Signs -- Affine Equivariant Multivariate Signs

Description

The function computes the Oja sign of a point x w.r.t. a data set X or, if no point x is given, the Oja signs of all points in X.

Usage

ojaSign(X, x = NULL, center = "ojaMedian", p = NULL, silent = FALSE, 
        na.action = na.fail, ...)

Arguments

X

numeric data.frame or matrix containing the data points as rows.

x

NULL or a numeric vector, the point for which the Oja sign should be computed.

center

one of the following three:

  • a numeric vector giving the location of the data,

  • a function that computes a multivariate location (see details below) or

  • one of the following strings:

    • "colMean" (vector of means, function colMeans is called),

    • "ojaMedian" (function ojaMedian),

    • "spatialMedian" (function spatial.median from package ICSNP),

    • "compMedian" (marginal median) or

    • "HRMedian" (Hettmansperger and Randles median, function HR.Mest from package ICSNP).

The default is "ojaMedian".

p

NULL or a number between 0 and 1 which specifies the fraction of hyperplanes to be used for subsampling. If p = 1, no subsampling is done. If p = NULL, the value of p is determined based on the size of the data set. See details.

silent

logical, if subsampling is done or the expected computation time is too long, a warning message will be printed unless silent is TRUE. The default is FALSE.

na.action

a function which indicates what should happen when the data contain 'NA's. Default is to fail.

arguments passed on to the location function.

Value

Either a numeric vector, the Oja sign of x, or a matrix of the same dimensions as X containing the Oja signs of X as rows.

Details

The function computes the Oja sign of the point x w.r.t. to the data set X or, if no x is specified, the Oja signs of all data points in X w.r.t. X. For a definition of Oja sign see reference below.

The matrix X needs to have at least as many rows as columns in order to give sensible results. The vector x has to be of length ncol(X). If x is specified, a vector of length ncol(X) is returned. Otherwise the return value is a matrix of the same dimensions as X where the \(i\)-th row contains the Oja sign of the \(i\)-th row of X. The matrix X must have at least two columns. For univariate signs use sign.

Oja signs (contrary to Oja ranks) require the computation of a center (location) of the data cloud. The function offers various ways to do this. One can explicitly pass a location as a numeric vector (which has to be of length ncol(X)), or pass a function that computes a multivariate location.

This function must accept data sets of the same type as X (i.e. row-wise) and return a numeric vector of length ncol(X). For example center = colMeans will do, whereas center = mean will not, since mean(X) computes the (univariate) mean of all elements of the matrix X. Note that some location functions return a list also containing information on the data or the computation rather than the numeric location estimate only. Via ... arguments can be passed on to the location function, see example 2 below. The mean and several nonparametric location estimates are implemented and can also be specified by passing a corresponding string. The ... option is then available, too. Being based on the same nonparametric concept the Oja median is the natural location for Oja signs and is hence the default. But since it is the solution of an optimization problem, it may -- depending on the optimization algorithm -- return different values when run on the same data.

For large data sets the function offers a subsampling option in order to deliver (approximate) results within reasonable time. For \(n\) data points in \(R^k\) the computation of the Oja sign necessitates the evaluation of \(N = choose(n,k-1)\) hyperplanes in \(R^k\). If \(p < 1\) is passed to the function, the computation is based on a random sample of only \(p N\) of all possible \(N\) hyperplanes. If p is not specified, it is automatically determined based on \(n\) and \(k\) to yield a sensible trade-off between accuracy and computing time. If \(N k^3 < 6 \cdot 10^6\), the sample fraction p is set to 1 (no subsampling). Otherwise p is chosen such that the computation (of one sign) usually takes around 20 seconds (on a 1.66 GHz CPU and 1 GB RAM). If all Oja signs of X are requested, a hyperplane sample is drawn once, all Oja signs are then computed based on this sample.

Finally, subsampling is feasible. Even for very small p useable results can be expected, see e.g. the examples for the function ojaSCM.

Claudia K<U+00F6>llmann is acknowledged for bug-fixing this function.

References

Fischer D, Mosler K, M<U+00F6>tt<U+00F6>nen J, Nordhausen K, Pokotylo O and Vogel D (2020). <U+201C>Computing the Oja Median in R: The Package OjaNP.<U+201D> Journal of Statistical Software, 92(8), pp. 1-36. doi: 10.18637/jss.v092.i08 (URL: http://doi.org/10.18637/jss.v092.i08).

Oja, H. (1999), Affine invariant multivariate sign and rank tests and corresponding estimates: A review, Scand. J. Statist., 26, 319--343.

See Also

ojaRank, ojaSignedRank, ojaMedian, spatial.median, HR.Mest, ojaSCM

Examples

Run this code
# NOT RUN {
### ----<< Example 1 >>---- : 30 points in R^2
set.seed(123)
X <- rmvnorm(n = 30, mean = c(0,0)) # from package 'mvtnorm'
y <- c(100,100)
om <- ojaMedian(X, alg = "exact")

ojaSign(X)
ojaSign(X,y)
# possible ways of specifying the mean as location:
ojaSign(X, center = "colMean")
ojaSign(X, center = colMeans)
ojaSign(X, center = colMeans(X))

# The following two return the same (only in different time),
ojaSign(X, center = colMeans)
t(apply(X, 1, function(y){ojaSign(X, y, center = colMeans)}))

# but the following not (due to different subsampling).
# 1)
set.seed(123)
ojaSign(X, center = colMeans, p = 0.9, silent = TRUE)
# 2)
set.seed(123)
t(apply(X, 1, function(y){ojaSign(X, y, c = colMeans,p = 0.9, s = TRUE)}))
# In 1) one subsample for all signs is drawn, whereas in 2)
# a different sample for each sign is drawn.

### ----<< Example 2 >>---- : Oja median
# The Oja sign of the Oja median is zero:
ojaSign(X, x = om, alg = "exact") 
# The default location function 'ojaMedian()' 
# is called with method "exact",
# which gives the same result as:
ojaSign(X, x = om, center = om) 
# But note: The following is likely to not return zero.  
ojaSign(X, x = ojaMedian(X))
# The default method of 'ojaMedian()' is "evo", 
# which is fast, but returns approximate results.



### ----<< Example 3 >>---- : 400 points in R^6
# Subsampling is done.
# The following example might take a bit longer:
# }
# NOT RUN {
set.seed(123)
X <- rmvnorm(n = 400, mean = rep(0, 6))
os1 <- ojaSign(X, x = 1:6, c = colMeans)
# Note: the following command may take several minutes
os2 <- ojaSign(X, x = 1:6, p = 0.0000001, c = colMeans)
# }

Run the code above in your browser using DataLab