Implementation of the Hull method suggested by Lorenzo-Seva, Timmerman, and Kiers (2011), with an extension to principal axis factoring. See details for parallelization.
HULL(
x,
N = NA,
n_fac_theor = NA,
method = c("PAF", "ULS", "ML"),
gof = c("CAF", "CFI", "RMSEA"),
eigen_type = c("SMC", "PCA", "EFA"),
use = c("pairwise.complete.obs", "all.obs", "complete.obs", "everything",
"na.or.complete"),
cor_method = c("pearson", "spearman", "kendall"),
n_datasets = 1000,
percent = 95,
decision_rule = c("means", "percentile", "crawford"),
n_factors = 1,
...
)
A list of class HULL containing the following objects
The number of factors to retain according to the Hull method with the CAF.
The number of factors to retain according to the Hull method with the CFI.
The number of factors to retain according to the Hull method with the RMSEA.
A matrix containing the CAFs, degrees of freedom, and for the factors lying on the hull, the st values of the hull solution (see Lorenzo-Seva, Timmerman, and Kiers 2011 for details).
A matrix containing the CFIs, degrees of freedom, and for the factors lying on the hull, the st values of the hull solution (see Lorenzo-Seva, Timmerman, and Kiers 2011 for details).
A matrix containing the RMSEAs, degrees of freedom, and for the factors lying on the hull, the st values of the hull solution (see Lorenzo-Seva, Timmerman, and Kiers 2011 for details).
The upper bound J of the number of factors to extract (see details).
A list of the settings used.
matrix or data.frame. Dataframe or matrix of raw data or matrix with correlations.
numeric. Number of cases in the data. This is passed to PARALLEL. Only has to be specified if x is a correlation matrix, otherwise it is determined based on the dimensions of x.
numeric. Theoretical number of factors to retain. The maximum of this number and the number of factors suggested by PARALLEL plus one will be used in the Hull method.
character. The estimation method to use. One of "PAF"
,
"ULS"
, or "ML"
, for principal axis factoring, unweighted
least squares, and maximum likelihood, respectively.
character. The goodness of fit index to use. Either "CAF"
,
"CFI"
, or "RMSEA"
, or any combination of them.
If method = "PAF"
is used, only
the CAF can be used as goodness of fit index. For details on the CAF, see
Lorenzo-Seva, Timmerman, and Kiers (2011).
character. On what the eigenvalues should be found in the
parallel analysis. Can be one of "SMC"
, "PCA"
, or "EFA"
.
If using "SMC"
(default), the diagonal of the correlation matrices is
replaced by the squared multiple correlations (SMCs) of the indicators. If
using "PCA"
, the diagonal values of the correlation
matrices are left to be 1. If using "EFA"
, eigenvalues are found on the
correlation matrices with the final communalities of an EFA solution as
diagonal. This is passed to PARALLEL
.
character. Passed to stats::cor
if raw data
is given as input. Default is "pairwise.complete.obs"
.
character. Passed to stats::cor
.
Default is "pearson"
.
numeric. The number of datasets to simulate. Default is 1000.
This is passed to PARALLEL
.
numeric. A vector of percentiles to take the simulated eigenvalues from.
Default is 95. This is passed to PARALLEL
.
character. Which rule to use to determine the number of
factors to retain. Default is "means"
, which will use the average
simulated eigenvalues. "percentile"
, uses the percentiles specified
in percent. "crawford"
uses the 95th percentile for the first factor
and the mean afterwards (based on Crawford et al, 2010). This is passed to PARALLEL
.
numeric. Number of factors to extract if "EFA"
is
included in eigen_type
. Default is 1. This is passed to
PARALLEL
.
Further arguments passed to EFA
, also in
PARALLEL
.
The Hull method aims to find a model with an optimal balance between model fit and number of parameters. That is, it aims to retrieve only major factors (Lorenzo-Seva, Timmerman, & Kiers, 2011). To this end, it performs the following steps (Lorenzo-Seva, Timmerman, & Kiers, 2011, p.351):
It performs parallel analysis and adds one to the identified number of factors (this number is denoted J). J is taken as an upper bound of the number of factors to retain in the hull method. Alternatively, a theoretical number of factors can be entered. In this case J will be set to whichever of these two numbers (from parallel analysis or based on theory) is higher.
For all 0 to J factors, the goodness-of-fit (one of CAF, RMSEA, or CFI) and the degrees of freedom (df) are computed.
The solutions are ordered according to their df.
Solutions that are not on the boundary of the convex hull are eliminated (see Lorenzo-Seva, Timmerman, & Kiers, 2011, for details).
All the triplets of adjacent solutions are considered consecutively. The middle solution is excluded if its point is below or on the line connecting its neighbors in a plot of the goodness-of-fit versus the degrees of freedom.
Step 5 is repeated until no solution can be excluded.
The st values of the “hull” solutions are determined.
The solution with the highest st value is selected.
The PARALLEL function and the principal axis factoring of the different number of factors can be parallelized using the future framework, by calling the future::plan function. The examples provide example code on how to enable parallel processing.
Note that if gof = "RMSEA"
is used, 1 - RMSEA is actually used to
compare the different solutions. Thus, the threshold of .05 is then .95. This
is necessary due to how the heuristic to locate the elbow of the hull works.
The ML estimation method uses the stats::factanal starting values. See also the EFA documentation.
The HULL
function can also be called together with other factor
retention criteria in the N_FACTORS
function.
Other factor retention criteria: CD
, EKC
,
KGC
, PARALLEL
, SMT
N_FACTORS
as a wrapper function for this and all the
above-mentioned factor retention criteria.
# \donttest{
# using PAF (this will throw a warning if gof is not specified manually
# and CAF will be used automatically)
HULL(test_models$baseline$cormat, N = 500, gof = "CAF")
# using ML with all available fit indices (CAF, CFI, and RMSEA)
HULL(test_models$baseline$cormat, N = 500, method = "ML")
# using ULS with only RMSEA
HULL(test_models$baseline$cormat, N = 500, method = "ULS", gof = "RMSEA")
# }
if (FALSE) {
# using parallel processing (Note: plans can be adapted, see the future
# package for details)
future::plan(future::multisession)
HULL(test_models$baseline$cormat, N = 500, gof = "CAF")
}
Run the code above in your browser using DataLab