mint.pca: P-integration with Principal Component Analysis

Description

Function to integrate and combine multiple independent studies measured on the same variables or predictors (P-integration) using a multigroup Principal Component Analysis.

Usage

mint.pca(X,
ncomp = 2,
study,
scale = TRUE,
tol = 1e-06,
max.iter = 100
)

Arguments

numeric matrix of predictors combining multiple independent studies on the same set of predictors. NAs are allowed.

ncomp

Number of components to include in the model (see Details). Default to 2

study

factor indicating the membership of each sample to each of the studies being combined

scale

boleean. If scale = TRUE, each block is standardized to zero means and unit variances. Default = TRUE.

tol

Convergence stopping value.

max.iter

integer, the maximum number of iterations.

Value

mint.pca returns an object of class "mint.pca", "pca", a list that contains the following components:

the centered and standardized original predictor matrix.

ncomp

the number of components included in the model.

study

The study grouping factor

sdev

the eigenvalues of the covariance/correlation matrix, though the calculation is actually done with the singular values of the data matrix or by using NIPALS.

center, scale

the centering and scaling used, or FALSE.

rotation

the matrix of variable loadings (i.e., a matrix whose columns contain the eigenvectors).

loadings

same as 'rotation' to keep the mixOmics spirit

the value of the rotated data (the centred (and scaled if requested) data multiplied by the rotation/loadings matrix), also called the principal components.

variates

same as 'x' to keep the mixOmics spirit

explained_variance

explained variance from the multivariate model, used for plotIndiv

names

list containing the names to be used for individuals and variables.

Details

mint.pca fits a vertical PCA model with ncomp components in which several independent studies measured on the same variables are integrated. The study factor indicates the membership of each sample in each study. We advise to only combine studies with more than 3 samples as the function performs internal scaling per study.

Missing values are handled by being disregarded during the cross product computations in the algorithm without having to delete rows with missing data. Alternatively, missing data can be imputed prior using the nipals function.

Useful graphical outputs are available, e.g. plotIndiv, plotLoadings, plotVar.

References

Rohart F, Eslami A, Matigian, N, Bougeard S, Le Cao K-A (2017). MINT: A multivariate integrative approach to identify a reproducible biomarker signature across multiple experiments and platforms. BMC Bioinformatics 18:128.

Eslami, A., Qannari, E. M., Kohler, A., and Bougeard, S. (2014). Algorithms for multi-group PLS. J. Chemometrics, 28(3), 192-201.

Examples

Run this code

# NOT RUN {
data(stemcells)

res = mint.pca(X = stemcells$gene, ncomp = 3,
study = stemcells$study)

plotIndiv(res, group = stemcells$celltype, legend=TRUE)

# }

Run the code above in your browser using DataLab