Various methods for performing parallel analysis. This function uses
future_lapply for which a parallel processing plan can
be selected. To do so, call library(future)
and, for example,
plan(multisession)
; see examples.
PARALLEL(
x = NULL,
N = NA,
n_vars = NA,
n_datasets = 1000,
percent = 95,
eigen_type = c("PCA", "SMC", "EFA"),
use = c("pairwise.complete.obs", "all.obs", "complete.obs", "everything",
"na.or.complete"),
cor_method = c("pearson", "spearman", "kendall"),
decision_rule = c("means", "percentile", "crawford"),
n_factors = 1,
...
)
A list of class PARALLEL containing the following objects
A matrix containing the eigenvalues of the real and the simulated data found with eigen_type = "PCA"
A matrix containing the eigenvalues of the real and the simulated data found with eigen_type = "SMC"
A matrix containing the eigenvalues of the real and the simulated data found with eigen_type = "EFA"
The number of factors to retain according to the parallel procedure with eigen_type = "PCA".
The number of factors to retain according to the parallel procedure with eigen_type = "SMC".
The number of factors to retain according to the parallel procedure with eigen_type = "EFA".
A list of control settings used in the print function.
matrix or data.frame. The real data to compare the simulated eigenvalues against. Must not contain variables of classes other than numeric. Can be a correlation matrix or raw data.
numeric. The number of cases / observations to simulate. Only has to
be specified if x
is either a correlation matrix or NULL
. If
x contains raw data, N
is found from the dimensions of x
.
numeric. The number of variables / indicators to simulate.
Only has to be specified if x
is left as NULL
as otherwise the
dimensions are taken from x
.
numeric. The number of datasets to simulate. Default is 1000.
numeric. The percentile to take from the simulated eigenvalues. Default is 95.
character. On what the eigenvalues should be found. Can be either "SMC", "PCA", or "EFA". If using "SMC", the diagonal of the correlation matrix is replaced by the squared multiple correlations (SMCs) of the indicators. If using "PCA", the diagonal values of the correlation matrices are left to be 1. If using "EFA", eigenvalues are found on the correlation matrices with the final communalities of an EFA solution as diagonal.
character. Passed to stats::cor
if raw data
is given as input. Default is "pairwise.complete.obs".
character. Passed to stats::cor
Default is "pearson".
character. Which rule to use to determine the number of
factors to retain. Default is "means"
, which will use the average
simulated eigenvalues. "percentile"
, uses the percentiles specified
in percent. "crawford"
uses the 95th percentile for the first factor
and the mean afterwards (based on Crawford et al, 2010).
numeric. Number of factors to extract if "EFA" is included in
eigen_type
. Default is 1.
Additional arguments passed to EFA
. For example,
the extraction method can be changed here (default is "PAF"). PAF is more
robust, but it will take longer compared to the other estimation methods
available ("ML" and "ULS").
Parallel analysis (Horn, 1965) compares the eigenvalues obtained from
the sample
correlation matrix against those of null model correlation matrices (i.e.,
with uncorrelated variables) of the same sample size. This way, it accounts
for the variation in eigenvalues introduced by sampling error and thus
eliminates the main problem inherent in the Kaiser-Guttman criterion
(KGC
).
Three different ways of finding the eigenvalues under the factor model are
implemented, namely "SMC", "PCA", and "EFA". PCA leaves the diagonal elements
of the correlation matrix as they are and is thus equivalent to what is done
in PCA. SMC uses squared multiple correlations as communality estimates with
which the diagonal of the correlation matrix is replaced. Finally, EFA performs
an EFA
with one factor (can be adapted to more factors) to estimate
the communalities and based on the correlation matrix with these as diagonal
elements, finds the eigenvalues.
Parallel analysis is often argued to be one of the most accurate factor retention criteria. However, for highly correlated factor structures it has been shown to underestimate the correct number of factors. The reason for this is that a null model (uncorrelated variables) is used as reference. However, when factors are highly correlated, the first eigenvalue will be much larger compared to the following ones, as later eigenvalues are conditional on the earlier ones in the sequence and thus the shared variance is already accounted in the first eigenvalue (e.g., Braeken & van Assen, 2017).
The PARALLEL
function can also be called together with other factor
retention criteria in the N_FACTORS
function.
Other factor retention criteria: CD
, EKC
,
HULL
, KGC
, SMT
N_FACTORS
as a wrapper function for this and all the
above-mentioned factor retention criteria.
# \donttest{
# example without real data
pa_unreal <- PARALLEL(N = 500, n_vars = 10)
# example with correlation matrix with all eigen_types and PAF estimation
pa_paf <- PARALLEL(test_models$case_11b$cormat, N = 500)
# example with correlation matrix with all eigen_types and ML estimation
# this will be faster than the above with PAF)
pa_ml <- PARALLEL(test_models$case_11b$cormat, N = 500, method = "ML")
# }
if (FALSE) {
# for parallel computation
future::plan(future::multisession)
pa_faster <- PARALLEL(test_models$case_11b$cormat, N = 500)
}
Run the code above in your browser using DataLab