yamm: Yet Another Multivariate Median

Description

Another method for computing the projection median for any dimensional dataset. Basically, it minimises the objective function yamm.obj over a unit hypersphere and finds the optimal shift vector mu in yamm.obj. optim in the stats package is used in this function to minimise yamm.obj.

Usage

yamm(x, nprojs = 2000, reltol = 1e-6, abstol=-Inf,
     xstart = L1median(x)$estimate,
     opt.method = "BFGS", doabs = 0, full.results=FALSE)

Arguments

The data as a matrix or data frame, with each row being viewed as one multivariate observation.

nprojs

The number of projections for the shifted data matrix while using the Monte Carlo method to approximate the integration. The default value is 2000, more projections may be required for complicated data to ensure accuracy, which, however, increases the running time.

reltol

The tolerance of the optimisation process gets supplied to control arguments of optim. The default value is \(1e-6\). Loosening tolerance will make the running process faster. Generally, \(1e-3\) is enough to obtain a good approximation for a short running time.

abstol

The absolute convergence tolerance of the optimisation process gets supplied to control arguments of optim. The default value is negative infinity.

xstart

The starting value for the optimiser. The default value is Spatial median of the data using function L1median. Other multivariate medians or mean values can also be used. Note, you should be aware of the outliers when using the mean values as a starting point, which may slow down the optimisation process or result in a less accurate median.

opt.method

The method chosen for the optimiser when computing the yamm, with default function “BFGS”. Apart from “BFGS”, other functions in optim like “Nelder-Mead”, “CG”, “L-BFGS-B”, and “SANN” can also be used.

doabs

If 0 (default), the function yamm.obj integrates the square of the univariate median of the projection to the shifted data set over a unit hypersphere; if 1, yamm.obj integrates the absolute value of the univariate median instead.

full.results

Logical. If FALSE (default), the function yamm only returns the best set of yamm location estimator found; if TRUE, a list of full reults from the function optim is displayed.

Value

If full.results = FALSE, it returns the best set of yamm location estimator found, otherwise, it returns a list comprising of

par

The best set of parameters found, which is the yamm location estimator.

value

The value of objective function yamm.obj corresponding to par.

counts

A two-element integer vector giving the number of calls to the objective function and gradient of the function respectively. This excludes those calls needed to compute the Hessian, if requested, and any calls to the objective function to compute a finite-difference approximation to the gradient.

convergence

An integer code. 0 indicates successful completion (which is always the case for method “SANN” and “Brent”). Possible error codes are

1 indicates that the iteration limit had been reached.

10 indicates degeneracy of the Nelder-Mead simplex.

51 indicates a warning from the “L-BFGS-B” method; see component message for further details.

52 indicates an error from the “L-BFGS-B” method; see component message for further details.

message

A character string giving any additional information returned by the optimiser, or NULL

References

Chen, F. and Nason, Guy P. (2020) A new method for computing the projection medi an, its influence curve and techniques for the production of projected quantile plots. PLOS One, 10.1371/journal.pone.0229845

Examples

Run this code

# NOT RUN {
data(beetle)
#
# Set seed for reproduction.
set.seed(5)
#
# Yamm approximated using 1000 projections.
yamm(beetle,nprojs = 1000,reltol=1e-3,doabs=0,full.results=TRUE)
#
# $par
# [1] 180.30601 124.23781  50.16349 135.53947  13.45252  95.64742
#
# $value
# [1] 5.704375
#
# $counts
# function gradient 
#      69        4 
#
# $convergence
# [1] 0
#
# $message
# NULL
# }