This function carries out classical PCA when the data may contain missing values, by an iterative algorithm. It is based on a Matlab function from the Missing Data Imputation Toolbox v1.0 by A. Folch-Fortuny, F. Arteaga and A. Ferrer.
ICPCA(X, k, scale = FALSE, maxiter = 20, tol = 0.005,
tolProb = 0.99, distprob = 0.99)
A list with components:
the scales of the columns of X.
the number of principal components.
the columns are the k loading vectors.
the k eigenvalues.
vector with the fitted center.
estimated covariance matrix.
number of iteration steps.
convergence criterion.
data with all NA's imputed.
scores of X.NAimp.
orthogonal distances of the rows of X.NAimp.
cutoff value for the OD.
score distances of the rows of X.NAimp.
cutoff value for the SD.
row numbers of cases whose OD
is above cutoffOD
.
row numbers of cases whose SD
is above cutoffSD
.
scale of the residuals.
standardized residuals. Note that these are NA
for all missing values of X
.
indices of cellwise outliers.
the input data, which must be a matrix or a data frame. It may contain NA's. It must always be provided.
the desired number of principal components
a value indicating whether and how the original
variables should be scaled. If scale=FALSE
(default)
or scale=NULL
no scaling is performed (and a vector
of 1s is returned in the $scaleX
slot).
If scale=TRUE
the variables are scaled to have a
standard deviation of 1. Alternatively scale can be a function like mad,
or a vector of length equal to the number of columns
of x. The resulting scale estimates are returned in the
$scaleX
slot of the output.
maximum number of iterations. Default is 20.
tolerance for iterations. Default is 0.005.
tolerance probability for residuals. Defaults to 0.99.
probability determining the cutoff values for orthogonal and score distances. Default is 0.99.
Wannes Van Den Bossche
Folch-Fortuny, A., Arteaga, F., Ferrer, A. (2016). Missing Data Imputation Toolbox for MATLAB. Chemometrics and Intelligent Laboratory Systems, 154, 93-100.
library(MASS)
set.seed(12345)
n <- 100; d <- 10
A <- diag(d) * 0.1 + 0.9
x <- mvrnorm(n, rep(0,d), A)
x[sample(1:(n * d), 100, FALSE)] <- NA
ICPCA.out <- ICPCA(x, k = 2)
plot(ICPCA.out$scores)
Run the code above in your browser using DataLab