This function performs the MacroPCA algorithm, which can deal with Missing values and Cellwise
and Rowwise Outliers. Note that this function first calls checkDataSet
and analyzes the remaining cleaned data.
MacroPCA(X, k = 0, MacroPCApars = NULL)
A list with components:
the options used in the call.
Cleaned data after checkDataSet
.
results of the first step of MacroPCA. These are needed to run MacroPCApredict on new data.
the scales of the columns of X
. When scale = FALSE
these are all \(1\).
the number of principal components.
the columns are the k
loading vectors.
the k
eigenvalues.
vector with the center.
alpha
from the input.
h
(computed from alpha
).
number of iteration steps.
convergence criterion.
data with all NA
's imputed by MacroPCA
.
scores of X.NAimp
.
orthogonal distances of the rows of X.NAimp
.
cutoff value for the OD.
score distances of the rows of X.NAimp
.
cutoff value for the SD.
row numbers of cases whose OD
is above cutoffOD
.
row numbers of cases whose SD
is above cutoffSD
.
scale of the residuals.
standardized residuals. Note that these are NA
for all missing values of X
.
indices of cellwise outliers.
various results for the NA-imputed data.
various results for the cell-imputed data.
various result for the fully imputed data.
X
is the input data, and must be an \(n\) by \(d\) matrix or a data frame. It must always be provided.
k
is the desired number of principal components.
If k = 0
or k = NULL
, the algorithm will compute the percentage
of explained variability for k
upto kmax
and show a scree plot,
and suggest to choose a value of k such that the cumulative percentage of
explained variability is at least 80%.
A list of available options detailed below. If MacroPCApars = NULL the defaults below are used.
DDCpars
A list with parameters for the first step of the MacroPCA
algorithm (for the complete list see the function
DDC
). Default is NULL
.
kmax
The maximal number of principal components to compute. Default
is kmax = 10
. If k
is provided kmax does not need to be specified,
unless k
is larger than 10 in which case you need to set kmax
high enough.
alpha
This is the coverage, i.e. the fraction of rows the algorithm
should give full weight. Alpha should be between 0.50 and 1, the default is
0.50.
scale
A value indicating whether and how the original variables should
be scaled. If scale = FALSE
or scale = NULL
no scaling is
performed (and a vector of 1s is returned in the $scaleX slot
).
If scale = TRUE
(default) the data are scaled by a 1-step M-estimator of scale with the Tukey biweight weight function to have a robust scale of 1.
Alternatively scale can be a vector of length
equal to the number of columns of x
. The resulting scale estimates are
returned in the $scaleX
slot of the MacroPCA output.
maxdir
The maximal number of random directions to use for computing the
outlyingness of the data points. Default is maxdir = 250
. If the number
\(n\) of observations is small all \(n * (n - 1) / 2\) pairs of
observations are used.
distprob
The quantile determining the cutoff values
for orthogonal and score distances. Default is 0.99.
silent
If TRUE, statements tracking the algorithm's progress will not be printed. Defaults to FALSE
.
maxiter
Maximum number of iterations. Default is 20.
tol
Tolerance for iterations. Default is 0.005.
center
if NULL
, MacroPCA will compute the center. If a vector with \(d\) components, this center will be used.
bigOutput
whether to compute and return NAimp, Cellimp and Fullimp. Defaults to TRUE
.
Rousseeuw P.J., Van den Bossche W.
Hubert, M., Rousseeuw, P.J., Van den Bossche W. (2019). MacroPCA: An all-in-one PCA method allowing for missing values as well as cellwise and rowwise outliers. Technometrics, 61(4), 459-473. (link to open access pdf)
checkDataSet
, cellMap
,
DDC
library(MASS)
set.seed(12345)
n <- 50; d <- 10
A <- matrix(0.9, d, d); diag(A) = 1
x <- mvrnorm(n, rep(0,d), A)
x[sample(1:(n * d), 50, FALSE)] <- NA
x[sample(1:(n * d), 50, FALSE)] <- 10
MacroPCA.out <- MacroPCA(x, 2)
cellMap(MacroPCA.out$stdResid)
# For more examples, we refer to the vignette:
if (FALSE) {
vignette("MacroPCA_examples")
}
Run the code above in your browser using DataLab