Learn R Programming

rospca (version 1.1.0)

robpca: ROBust PCA algorithm

Description

ROBPCA algorithm of Hubert et al. (2005) including reweighting (Engelen et al., 2005) and possible extension to skewed data (Hubert et al., 2009).

Usage

robpca (x, k = 0, kmax = 10, alpha = 0.75, h = NULL, mcd = FALSE, 
        ndir = "all", skew = FALSE, ...)

Value

A list with components:

loadings

Loadings matrix containing the robust loadings (eigenvectors), a numeric matrix of size \(p\) by \(k\).

eigenvalues

Numeric vector of length \(k\) containing the robust eigenvalues.

scores

Scores matrix (computed as \((X-center) \cdot loadings)\), a numeric matrix of size \(n\) by \(k\).

center

Numeric vector of length \(k\) containing the centre of the data.

k

Number of (chosen) principal components.

H0

Logical vector of size \(n\) indicating if an observation is in the initial h-subset.

H1

Logical vector of size \(n\) indicating if an observation is kept in the reweighting step.

alpha

The robustness parameter \(\alpha\) used throughout the algorithm.

h

The \(h\)-parameter used throughout the algorithm.

sd

Numeric vector of size \(n\) containing the robust score distances within the robust PCA subspace.

od

Numeric vector of size \(n\) containing the orthogonal distances to the robust PCA subspace.

cutoff.sd

Cut-off value for the robust score distances.

cutoff.od

Cut-off value for the orthogonal distances.

flag.sd

Numeric vector of size \(n\) containing the SD-flags of the observations. The observations whose score distance is larger than cutoff.sd receive an SD-flag equal to zero. The other observations receive an SD-flag equal to 1.

flag.od

Numeric vector of size \(n\) containing the OD-flags of the observations. The observations whose orthogonal distance is larger than cutoff.od receive an OD-flag equal to zero. The other observations receive an OD-flag equal to 1.

flag.all

Numeric vector of size \(n\) containing the flags of the observations. The observations whose score distance is larger than cutoff.sd or whose orthogonal distance is larger than cutoff.od can be considered as outliers and receive a flag equal to zero. The regular observations receive flag 1.

Arguments

x

An \(n\) by \(p\) matrix or data matrix with observations in the rows and variables in the columns.

k

Number of principal components that will be used. When k=0 (default), the number of components is selected using the criterion in Hubert et al. (2005).

kmax

Maximal number of principal components that will be computed, default is 10.

alpha

Robustness parameter, default is 0.75.

h

The number of outliers the algorithm should resist is given by \(n-h\). Any value for h between \(n/2\) and \(n\) may be specified. Default is NULL which uses h=ceiling(alpha*n)+1. Do not specify alpha and h at the same time.

mcd

Logical indicating if the MCD adaptation of ROBPCA may be applied when the number of variables is sufficiently small (see Details). If mcd=FALSE (default), the full ROBPCA algorithm is always applied.

ndir

Number of directions used when computing the outlyingness (or the adjusted outlyingness when skew=TRUE), see outlyingness and adjOutl for more details.

skew

Logical indicating if the version for skewed data (Hubert et al., 2009) is applied, default is FALSE.

...

Other arguments to pass to methods.

Author

Tom Reynkens, based on R code from Valentin Todorov for PcaHubert in rrcov (released under GPL-3) and Matlab code from Katrien Van Driessen (for the univariate MCD).

Details

This function is based extensively on PcaHubert from rrcov and there are two main differences:

The outlyingness measure that is used for non-skewed data (skew=FALSE) is the Stahel-Donoho measure as described in Hubert et al. (2005) which is also used in PcaHubert. The implementation in mrfDepth (which is used here) is however much faster than the one in PcaHubert and hence more, or even all, directions can be considered when computing the outlyingness measure.

Moreover, the extension for skewed data of Hubert et al. (2009) (skew=TRUE) is also implemented here, but this is not included in PcaHubert.

For an extensive description of the ROBPCA algorithm we refer to Hubert et al. (2005) and to PcaHubert.

When mcd=TRUE and \(n<5 \times p\), we do not apply the full ROBPCA algorithm. The loadings and eigenvalues are then computed as the eigenvectors and eigenvalues of the MCD estimator applied to the data set after the SVD step.

References

Hubert, M., Rousseeuw, P. J., and Vanden Branden, K. (2005), ``ROBPCA: A New Approach to Robust Principal Component Analysis,'' Technometrics, 47, 64--79.

Engelen, S., Hubert, M. and Vanden Branden, K. (2005), ``A Comparison of Three Procedures for Robust PCA in High Dimensions", Austrian Journal of Statistics, 34, 117--126.

Hubert, M., Rousseeuw, P. J., and Verdonck, T. (2009), ``Robust PCA for Skewed Data and Its Outlier Map," Computational Statistics & Data Analysis, 53, 2264--2274.

See Also

PcaHubert, outlyingness, adjOutl

Examples

Run this code
X <- dataGen(m=1, n=100, p=10, eps=0.2, bLength=4)$data[[1]]

resR <- robpca(X, k=2)
diagPlot(resR)

Run the code above in your browser using DataLab