ICPCA: Iterative Classical PCA

Description

This function carries out classical PCA when the data may contain missing values, by an iterative algorithm. It is based on a Matlab function from the Missing Data Imputation Toolbox v1.0 by A. Folch-Fortuny, F. Arteaga and A. Ferrer.

Usage

ICPCA(X, k, scale = FALSE, maxiter = 20, tol = 0.005,
      tolProb = 0.99, distprob = 0.99)

Value

A list with components:

scaleX: the scales of the columns of X.
k: the number of principal components.
loadings: the columns are the k loading vectors.
eigenvalues: the k eigenvalues.
center: vector with the fitted center.
covmatrix: estimated covariance matrix.
It: number of iteration steps.
diff: convergence criterion.
X.NAimp: data with all NA's imputed.
scores: scores of X.NAimp.
OD: orthogonal distances of the rows of X.NAimp.
cutoffOD: cutoff value for the OD.
SD: score distances of the rows of X.NAimp.
cutoffSD: cutoff value for the SD.
highOD: row numbers of cases whose OD is above cutoffOD.
highSD: row numbers of cases whose SD is above cutoffSD.
residScale: scale of the residuals.
stdResid: standardized residuals. Note that these are NA for all missing values of X.
indcells: indices of cellwise outliers.

Arguments

X: the input data, which must be a matrix or a data frame. It may contain NA's. It must always be provided.
k: the desired number of principal components
scale: a value indicating whether and how the original variables should be scaled. If scale=FALSE (default) or scale=NULL no scaling is performed (and a vector of 1s is returned in the $scaleX slot). If scale=TRUE the variables are scaled to have a standard deviation of 1. Alternatively scale can be a function like mad, or a vector of length equal to the number of columns of x. The resulting scale estimates are returned in the $scaleX slot of the output.
maxiter: maximum number of iterations. Default is 20.
tol: tolerance for iterations. Default is 0.005.
tolProb: tolerance probability for residuals. Defaults to 0.99.
distprob: probability determining the cutoff values for orthogonal and score distances. Default is 0.99.

Author

Wannes Van Den Bossche

References

Folch-Fortuny, A., Arteaga, F., Ferrer, A. (2016). Missing Data Imputation Toolbox for MATLAB. Chemometrics and Intelligent Laboratory Systems, 154, 93-100.

Examples

Run this code

library(MASS) 
set.seed(12345) 
n <- 100; d <- 10
A <- diag(d) * 0.1 + 0.9
x <- mvrnorm(n, rep(0,d), A)
x[sample(1:(n * d), 100, FALSE)] <- NA
ICPCA.out <- ICPCA(x, k = 2)
plot(ICPCA.out$scores)

Run the code above in your browser using DataLab