autoFMradio
is a wrapper function that automates the three main steps of the FMradio
workflow.
autoFMradio(X, t = .95, fold = 5, GB = 1, type = "thomson",
verbose = TRUE, printInfo = TRUE, seed = NULL)
A data matrix
or an ExpressionSet
object.
A scalar numeric
indicating the absolute value for thresholding.
A numeric
integer or integer
indicating the number of folds to use in cross-validation.
A numeric
integer or integer
indicating which Guttman bound to use for determining the number of latent features to retain.
Must be either 1, 2, or 3.
A character
indicating the type of factor score to calculate.
Must be one of: "thomson", "bartlett", "anderson".
A logical
indicating if function should run silently.
Runs silently when verbose = FALSE
.
A logical
indicating if additional information should be printed on-screen.
Suppresses printing when verbose = FALSE
.
A numeric
integer or integer
indicating the seed for the random number generator.
The function returns an object of class list
:
An object of class data.frame
containing the factor scores. Observations are represented in the rows. Each column represent a latent factor.
Subsetted data matrix
containing only those features retained after redundancy filtering.
A correlation matrix
based on the data in the $FilteredData
slot.
A numeric
scalar representing the optimal value for the penalty parameter.
A matrix
representing the regularized correlation matrix under the optimal penalty-value.
An integer
correspond to number of latent factors retained under the chosen Guttman bound.
A matrix of class loadings
representing the loadings matrix in which in which each element \(\lambda_{jk}\) is the loading of the \(j\)th feature on the \(k\)th latent factor.
A matrix
representing the diagonal matrix carrying the unique variances.
A numeric
vector representing the cumulative variance for each respective latent feature.
A numeric
vector indicating, for each factor, the squared multiple correlation between the observed features and the common latent factor.
A numeric
or integer
used as the starting seed in random number generation.
The autoFMradio
function automates the three main steps of the workflow by providing a wrapper around all core functions.
Step 1 (regularized correlation matrix estimation) is performed using the X
, t
, and fold arguments.
The raw correlation matrix based on data X
is redundancy-filtered using the threshold provided in t
.
Subsequently, a regularized estimate of the correlation matrix (on the possibly filtered feature set) is computed with the optimal penalty value determined by cross-validation.
The number of folds is set by the fold
argument.
For more information on Step 1 see RF
, subSet
, and regcor
.
Step 2 (factor analytic data compression) is performed using the GB
argument.
With this argument one can use either the first, second, or third Guttman bound to select the intrinsic dimensionality of the latent vector.
This bound, together with the regularized correlation matrix, is used in a maximum likelihood factor analysis with simple-structure rotation.
For more information on Step 2, see dimGB
and mlFA
.
Step 3 (obtaining factor scores) is performed using the type
argument.
It determines factor scores: the score each object/individual would obtain on each of the latent factors.
The type
argument determines the type of factor score that is calculated.
For more information on Step 3, see facScore
.
When printInfo = TRUE
additional information is printed on-screen after the full procedure has run its course.
This additional information pertains to each of the steps mentioned above.
For Step 1 it reiterates the thresholding value for redundancy filtering and gives the number of features retained after this filtering.
It also reiterates the number of folds used in determining the optimal penalty value as well as this value itself.
Moreover, it provides the value of the Kaiser-Meyer-Olkin index on the optimal regularized correlation matrix estimate (see SA
).
For Step 2 it reiterates which Guttman bound was used in determining the number of latent factors as well as the number of latent factors retained.
It also gives the proportion of explained variance under the factor solution of the chosen latent dimension (see dimVAR
).
For step 3 it reiterates the type of factor score that was calculated.
Also, it prints the lowest `determinacy score' amongst the latent factors (see facSMC
).
The factor scores in the $Scores
slot of the output (see below) can be directly used as input features in any prediction or classification procedure.
In case of external (rather than internal) validation one can use the parameter matrices in the $Loadings
and $Uniqueness
slots in combination with fresh data to provide a validation factor projection based on the training solution.
See Peeters et al. (2019).
Peeters, C.F.W. et al. (2019). Stable prediction with radiomics data. arXiv:1903.11696 [stat.ML].
# NOT RUN {
## Simulate some data according to a factor model with 3 latent factors
simDAT <- FAsim(p = 24, m = 3, n = 40, loadingvalue = .9)
X <- simDAT$data
## Perform the lot
FullMonty <- autoFMradio(X, GB = 1, seed = 303)
# }
Run the code above in your browser using DataLab