do.bpca: Bayesian Principal Component Analysis

Description

Bayesian PCA (BPCA) is a further variant of PCA in that it imposes prior and encodes basis selection mechanism. Even though the model is fully Bayesian, do.bpca faithfully follows the original paper by Bishop in that it only returns the mode value of posterior as an estimate, in conjunction with ARD-motivated prior as well as consideration of variance to be estimated. Unlike PPCA, it uses full basis and returns relative weight for each base in that the smaller \(\alpha\) value is, the more likely corresponding column vector of mp.W to be selected as potential basis.

Usage

do.bpca(
  X,
  ndim = 2,
  preprocess = c("center", "scale", "cscale", "decorrelate", "whiten"),
  reltol = 1e-04,
  maxiter = 123
)

Arguments

an \((n\times p)\) matrix or data frame whose rows are observations and columns represent independent variables.

ndim

an integer-valued target dimension.

preprocess

an option for preprocessing the data. Default is "center". See also aux.preprocess for more details.

reltol

stopping criterion for iterative update for EM algorithm.

maxiter

maximum number of iterations allowed for EM algorithm.

Value

a named list containing

Y: an \((n\times ndim)\) matrix whose rows are embedded observations.
trfinfo: a list containing information for out-of-sample prediction.
projection: a \((p\times ndim)\) whose columns are principal components.
mp.itercount: the number of iterations taken for EM algorithm to converge.
mp.sigma2: estimated \(\sigma^2\) value via EM algorithm.
mp.alpha: length-ndim-1 vector of relative weight for each base in mp.W.
mp.W: an \((ndim\times ndim-1)\) matrix from EM update.

References

bishop_bayesian_1999Rdimtools

Examples

Run this code

# NOT RUN {
## use iris dataset
data(iris)
set.seed(100)
subid = sample(1:150,50)
X     = as.matrix(iris[subid,1:4])
lab   = as.factor(iris[subid,5])

## compare PCA and BPCA
out1  <- do.pca(X,  ndim=2)
out2  <- do.bpca(X, ndim=2)

## visualize
opar <- par(no.readonly=TRUE)
par(mfrow=c(1,2))
plot(out1$Y, col=lab, pch=19, cex=0.8, main="PCA")
plot(out2$Y, col=lab, pch=19, cex=0.8, main="BPCA")
par(opar)
# }
# NOT RUN {
# }