autoFMradio: Wrapper for automated workflow

Description

autoFMradio is a wrapper function that automates the three main steps of the FMradio workflow.

Usage

autoFMradio(X, t = .95, fold = 5, GB = 1, type = "thomson",
            verbose = TRUE, printInfo = TRUE, seed = NULL)

Arguments

A data matrix or an ExpressionSet object.

A scalar numeric indicating the absolute value for thresholding.

fold

A numeric integer or integer indicating the number of folds to use in cross-validation.

A numeric integer or integer indicating which Guttman bound to use for determining the number of latent features to retain. Must be either 1, 2, or 3.

type

A character indicating the type of factor score to calculate. Must be one of: "thomson", "bartlett", "anderson".

verbose

A logical indicating if function should run silently. Runs silently when verbose = FALSE.

printInfo

A logical indicating if additional information should be printed on-screen. Suppresses printing when verbose = FALSE.

seed

A numeric integer or integer indicating the seed for the random number generator.

Value

The function returns an object of class list:

$Scores

An object of class data.frame containing the factor scores. Observations are represented in the rows. Each column represent a latent factor.

$FilteredData

Subsetted data matrix containing only those features retained after redundancy filtering.

$FilteredCor

A correlation matrix based on the data in the $FilteredData slot.

$optPen

A numeric scalar representing the optimal value for the penalty parameter.

$optCor

A matrix representing the regularized correlation matrix under the optimal penalty-value.

An integer correspond to number of latent factors retained under the chosen Guttman bound.

$Loadings

A matrix of class loadings representing the loadings matrix in which in which each element $\lambda_{jk}$ is the loading of the $j$th feature on the $k$th latent factor.

$Uniqueness

A matrix representing the diagonal matrix carrying the unique variances.

$Exvariance

A numeric vector representing the cumulative variance for each respective latent feature.

$determinacy

A numeric vector indicating, for each factor, the squared multiple correlation between the observed features and the common latent factor.

$used.seed

A numeric or integer used as the starting seed in random number generation.

Details

The autoFMradio function automates the three main steps of the workflow by providing a wrapper around all core functions.

Step 1 (regularized correlation matrix estimation) is performed using the X, t, and fold arguments. The raw correlation matrix based on data X is redundancy-filtered using the threshold provided in t. Subsequently, a regularized estimate of the correlation matrix (on the possibly filtered feature set) is computed with the optimal penalty value determined by cross-validation. The number of folds is set by the fold argument. For more information on Step 1 see RF, subSet, and regcor.

Step 2 (factor analytic data compression) is performed using the GB argument. With this argument one can use either the first, second, or third Guttman bound to select the intrinsic dimensionality of the latent vector. This bound, together with the regularized correlation matrix, is used in a maximum likelihood factor analysis with simple-structure rotation. For more information on Step 2, see dimGB and mlFA.

Step 3 (obtaining factor scores) is performed using the type argument. It determines factor scores: the score each object/individual would obtain on each of the latent factors. The type argument determines the type of factor score that is calculated. For more information on Step 3, see facScore.

When printInfo = TRUE additional information is printed on-screen after the full procedure has run its course. This additional information pertains to each of the steps mentioned above. For Step 1 it reiterates the thresholding value for redundancy filtering and gives the number of features retained after this filtering. It also reiterates the number of folds used in determining the optimal penalty value as well as this value itself. Moreover, it provides the value of the Kaiser-Meyer-Olkin index on the optimal regularized correlation matrix estimate (see SA). For Step 2 it reiterates which Guttman bound was used in determining the number of latent factors as well as the number of latent factors retained. It also gives the proportion of explained variance under the factor solution of the chosen latent dimension (see dimVAR). For step 3 it reiterates the type of factor score that was calculated. Also, it prints the lowest `determinacy score' amongst the latent factors (see facSMC).

The factor scores in the $Scores slot of the output (see below) can be directly used as input features in any prediction or classification procedure. In case of external (rather than internal) validation one can use the parameter matrices in the $Loadings and $Uniqueness slots in combination with fresh data to provide a validation factor projection based on the training solution. See Peeters et al. (2019).

References

Peeters, C.F.W. et al. (2019). Stable prediction with radiomics data. arXiv:1903.11696 [stat.ML].

Examples

Run this code

# NOT RUN {
## Simulate some data according to a factor model with 3 latent factors
simDAT <- FAsim(p = 24, m = 3, n = 40, loadingvalue = .9)
X <- simDAT$data

## Perform the lot
FullMonty <- autoFMradio(X, GB = 1, seed = 303)
# }

Run the code above in your browser using DataLab