Performs mosaic integrative non-negative matrix factorization (UINMF) (A.R. Kriebel, 2022) using block coordinate descent (alternating non-negative least squares, ANLS) to return factorized \(H\), \(W\), \(V\) and \(U\) matrices. The objective function is stated as
$$\arg\min_{H\ge0,W\ge0,V\ge0,U\ge0}\sum_{i}^{d} ||\begin{bmatrix}E_i \\ P_i \end{bmatrix} - (\begin{bmatrix}W \\ 0 \end{bmatrix}+ \begin{bmatrix}V_i \\ U_i \end{bmatrix})Hi||^2_F+ \lambda_i\sum_{i}^{d}||\begin{bmatrix}V_i \\ U_i \end{bmatrix}H_i||_F^2$$
where \(E_i\) is the input non-negative matrix of the \(i\)'th dataset, \(P_i\) is the input non-negative matrix for the unshared features, \(d\) is the total number of datasets. \(E_i\) is of size \(m \times n_i\) for \(m\) shared features and \(n_i\) cells, \(P_i\) is of size \(u_i \times n_i\) for \(u_i\) unshared feaetures, \(H_i\) is of size \(k \times n_i\), \(V_i\) is of size \(m \times k\), \(W\) is of size \(m \times k\) and \(U_i\) is of size \(u_i \times k\).
The factorization produces a shared \(W\) matrix (genes by k). For each dataset, an \(H\) matrix (k by cells), a \(V\) matrix (genes by k) and a \(U\) matrix (unshared genes by k). The \(H\) matrices represent the cell factor loadings. \(W\) is held consistent among all datasets, as it represents the shared components of the metagenes across datasets. The \(V\) matrices represent the dataset-specific components of the metagenes, \(U\) matrices are similar to \(V\)s but represents the loading contributed by unshared features.
This function adopts highly optimized fast and memory efficient
implementation extended from Planc (Kannan, 2016). Pre-installation of
extension package RcppPlanc
is required. The underlying algorithm
adopts the identical ANLS strategy as optimizeALS(unshared =
TRUE)
in the old version of LIGER.
runUINMF(object, k = 20, lambda = 5, ...)# S3 method for liger
runUINMF(
object,
k = 20,
lambda = 5,
nIteration = 30,
nRandomStarts = 1,
seed = 1,
nCores = 2L,
verbose = getOption("ligerVerbose", TRUE),
...
)
liger method - Returns updated input liger object.
A list of all \(H\) matrices can be accessed with
getMatrix(object, "H")
A list of all \(V\) matrices can be accessed with
getMatrix(object, "V")
The \(W\) matrix can be accessed with
getMatrix(object, "W")
A list of all \(U\) matrices can be accessed with
getMatrix(object, "U")
liger object. Should run
selectGenes
with unshared = TRUE
and then run
scaleNotCenter
in advance.
Inner dimension of factorization (number of factors). Generally, a
higher k
will be needed for datasets with more sub-structure. Default
20
.
Regularization parameter. Larger values penalize
dataset-specific effects more strongly (i.e. alignment should increase as
lambda
increases). Default 5
.
Arguments passed to other methods and wrapped functions.
Total number of block coordinate descent iterations to
perform. Default 30
.
Number of restarts to perform (iNMF objective function
is non-convex, so taking the best objective from multiple successive
initialization is recommended). For easier reproducibility, this increments
the random seed by 1 for each consecutive restart, so future factorization
of the same dataset can be run with one rep if necessary. Default 1
.
Random seed to allow reproducible results. Default 1
.
The number of parallel tasks to speed up the computation.
Default 2L
. Only supported for platform with OpenMP support.
Logical. Whether to show information of the progress. Default
getOption("ligerVerbose")
or TRUE
if users have not set.
April R. Kriebel and Joshua D. Welch, UINMF performs mosaic integration of single-cell multi-omic datasets using nonnegative matrix factorization, Nat. Comm., 2022
pbmc <- normalize(pbmc)
pbmc <- selectGenes(pbmc, useUnsharedDatasets = c("ctrl", "stim"))
pbmc <- scaleNotCenter(pbmc)
if (!is.null(getMatrix(pbmc, "scaleUnsharedData", "ctrl")) &&
!is.null(getMatrix(pbmc, "scaleUnsharedData", "stim"))) {
# TODO: unshared variable features cannot be detected from this example
pbmc <- runUINMF(pbmc)
}
Run the code above in your browser using DataLab