The directional outlyingness (DO) of multivariate data was introduced in Rousseeuw et al. (2018). It extends the Stahel-Donoho outlyingness towards skewed distributions.
Depending on the dimension \(p\), different approximate algorithms are implemented. The affine invariant algorithm can only be used when \(n > p\). It draws ndir
times at random \(p\) observations from x
and considers the direction orthogonal to the hyperplane spanned by these \(p\) observations. At most \(p\) out of \(n\) directions can be considered. The orthogonal invariant version can be applied to high-dimensional data. It draws ndir
times at random \(2\) observations from x
and considers the direction through these two observations. Here, at most 2 out of \(n\) directions can be considered. Finally, the shift invariant version randomly draws ndir
vectors from the unit sphere.
The resulting DO values are invariant to affine transformations, rotations and shifts respectively provided that the seed
is kept fixed at different runs of the algorithm. Note that the DO values are guaranteed to increase when more directions are considered provided the seed is kept fixed, as this ensures that the random directions are generated in a fixed order.
An observation from x
and z
is flagged as an outlier if its DO exceeds a cutoff value. This cutoff value is determined using the procedure in Rousseeuw et al. (2018). First, the logarithm of the DO values is taken to render their distribution more symmetric, after which a normal approximation yields a cutoff on these values. The cutoff is then transformed back by applying the exponential function.
It is first checked whether the data lie in a subspace of dimension smaller than \(p\). If so, a warning is given, as well as the dimension of the subspace and a direction which is orthogonal to it. Furthermore, the univariate directional outlyingness of the projected points x
\(v\) is ill-defined when the scale in its denominator becomes zero. This can happen when many observations collapse. In these cases the algorithm will stop and give a warning. The returned values then include the direction \(v\) as well as an indicator specifying which of the observations of x
belong to the hyperplane orthogonal to \(v\).