runUINMF: Perform Mosaic iNMF (UINMF) on scaled datasets with unshared features

Description

Performs mosaic integrative non-negative matrix factorization (UINMF) (A.R. Kriebel, 2022) using block coordinate descent (alternating non-negative least squares, ANLS) to return factorized $H$, $W$, $V$ and $U$ matrices. The objective function is stated as

$$\arg\min_{H\ge0,W\ge0,V\ge0,U\ge0}\sum_{i}^{d} ||\begin{bmatrix}E_i \\ P_i \end{bmatrix} - (\begin{bmatrix}W \\ 0 \end{bmatrix}+ \begin{bmatrix}V_i \\ U_i \end{bmatrix})Hi||^2_F+ \lambda_i\sum_{i}^{d}||\begin{bmatrix}V_i \\ U_i \end{bmatrix}H_i||_F^2$$

where $E_i$ is the input non-negative matrix of the $i$'th dataset, $P_i$ is the input non-negative matrix for the unshared features, $d$ is the total number of datasets. $E_i$ is of size $m \times n_i$ for $m$ shared features and $n_i$ cells, $P_i$ is of size $u_i \times n_i$ for $u_i$ unshared feaetures, $H_i$ is of size $k \times n_i$, $V_i$ is of size $m \times k$, $W$ is of size $m \times k$ and $U_i$ is of size $u_i \times k$.

The factorization produces a shared $W$ matrix (genes by k). For each dataset, an $H$ matrix (k by cells), a $V$ matrix (genes by k) and a $U$ matrix (unshared genes by k). The $H$ matrices represent the cell factor loadings. $W$ is held consistent among all datasets, as it represents the shared components of the metagenes across datasets. The $V$ matrices represent the dataset-specific components of the metagenes, $U$ matrices are similar to $V$s but represents the loading contributed by unshared features.

This function adopts highly optimized fast and memory efficient implementation extended from Planc (Kannan, 2016). Pre-installation of extension package RcppPlanc is required. The underlying algorithm adopts the identical ANLS strategy as optimizeALS(unshared = TRUE) in the old version of LIGER.

Usage

runUINMF(object, k = 20, lambda = 5, ...)
# S3 method for liger
runUINMF(
  object,
  k = 20,
  lambda = 5,
  nIteration = 30,
  nRandomStarts = 1,
  seed = 1,
  nCores = 2L,
  verbose = getOption("ligerVerbose", TRUE),
  ...
)

Value

liger method - Returns updated input liger object.
- A list of all $H$ matrices can be accessed with getMatrix(object, "H")
- A list of all $V$ matrices can be accessed with getMatrix(object, "V")
- The $W$ matrix can be accessed with getMatrix(object, "W")
- A list of all $U$ matrices can be accessed with getMatrix(object, "U")

Arguments

object: liger object. Should run selectGenes with unshared = TRUE and then run scaleNotCenter in advance.
k: Inner dimension of factorization (number of factors). Generally, a higher k will be needed for datasets with more sub-structure. Default 20.
lambda: Regularization parameter. Larger values penalize dataset-specific effects more strongly (i.e. alignment should increase as lambda increases). Default 5.
...: Arguments passed to other methods and wrapped functions.
nIteration: Total number of block coordinate descent iterations to perform. Default 30.
nRandomStarts: Number of restarts to perform (iNMF objective function is non-convex, so taking the best objective from multiple successive initialization is recommended). For easier reproducibility, this increments the random seed by 1 for each consecutive restart, so future factorization of the same dataset can be run with one rep if necessary. Default 1.
seed: Random seed to allow reproducible results. Default 1.
nCores: The number of parallel tasks to speed up the computation. Default 2L. Only supported for platform with OpenMP support.
verbose: Logical. Whether to show information of the progress. Default getOption("ligerVerbose") or TRUE if users have not set.

References

April R. Kriebel and Joshua D. Welch, UINMF performs mosaic integration of single-cell multi-omic datasets using nonnegative matrix factorization, Nat. Comm., 2022

Examples

Run this code

pbmc <- normalize(pbmc)
pbmc <- selectGenes(pbmc, useUnsharedDatasets = c("ctrl", "stim"))
pbmc <- scaleNotCenter(pbmc)
if (!is.null(getMatrix(pbmc, "scaleUnsharedData", "ctrl")) &&
    !is.null(getMatrix(pbmc, "scaleUnsharedData", "stim"))) {
    # TODO: unshared variable features cannot be detected from this example
    pbmc <- runUINMF(pbmc)
}

Run the code above in your browser using DataLab