Horn's parallel analysis involves shuffling observations within each row of
x
to create a permuted matrix. PCA is performed on the permuted matrix
to obtain the percentage of variance explained under a random null hypothesis.
This is repeated over several iterations to obtain a distribution of curves on
the scree plot.
For each PC, the “p-value” (for want of a better word) is defined as the
proportion of iterations where the variance explained at that PC is greater
than that observed with the original matrix. The number of PCs to retain is
defined as the last PC where the p-value is below threshold
. This aims
to retain all PCs that explain “significantly” more variance than
expected by chance.
This function can be sped up by specifying BSPARAM=IrlbaParam()
or
similar, to use approximate strategies for performing the PCA. Another option
is to set BPPARAM
to perform the iterations in parallel.