calcAlignment: Calculate alignment metric after integration

Description

This metric quantifies how well-aligned two or more datasets are. We randomly downsample all datasets to have as many cells as the smallest one. We construct a nearest-neighbor graph and calculate for each cell how many of its neighbors are from the same dataset. We average across all cells and compare to the expected value for perfectly mixed datasets, and scale the value from 0 to 1. Note that in practice, alignment can be greater than 1 occasionally.

Usage

calcAlignment(
  object,
  clustersUse = NULL,
  clusterVar = NULL,
  nNeighbors = NULL,
  cellIdx = NULL,
  cellComp = NULL,
  resultBy = c("all", "dataset", "cell"),
  seed = 1,
  k = nNeighbors,
  rand.seed = seed,
  cells.use = cellIdx,
  cells.comp = cellComp,
  clusters.use = clustersUse,
  by.cell = NULL,
  by.dataset = NULL
)

Value

The alignment metric.

Arguments

object: A liger object, with quantileNorm already run.
clustersUse: The clusters to consider for calculating the alignment. Should be a vector of existing levels in clusterVar. Default NULL. See Details.
clusterVar: The name of one variable in cellMeta(object). Default NULL uses default clusters.
nNeighbors: Number of neighbors to use in calculating alignment. Default NULL uses floor(0.01*ncol(object)), with a lower bound of 10 in all cases except where the total number of sampled cells is less than 10.
cellIdx, cellComp: Character, logical or numeric index that can subscribe cells. Default NULL. See Details.
resultBy: Select from "all", "dataset" or "cell". On which level should the mean alignment be calculated. Default "all".
seed: Random seed to allow reproducible results. Default 1.
k, rand.seed, cells.use, cells.comp, clusters.use: [Deprecated] Please see Usage for replacement.
by.cell, by.dataset: [Defunct] Use resultBy instead.

Details

$\bar{x}$ is the average number of neighbors belonging to any cells' same dataset, $N$ is the number of datasets, $k$ is the number of neighbors in the KNN graph. $$1 - \frac{\bar{x} - \frac{k}{N}}{k - \frac{k}{N}}$$

The selection on cells to be measured can be done in various way and represent different scenarios:

By default, all cells are considered and the alignment across all datasets will be calculated.
Select clustersUse from clusterVar to use cells from the clusters of interests. This measures the alignment across all covered datasets within the specified clusters.
Only Specify cellIdx for flexible selection. This measures the alignment across all covered datasets within the specified cells. A none-NULL cellIdx privileges over clustersUse.
Specify cellIdx and cellComp at the same time, so that the original dataset source will be ignored and cells specified by each argument will be regarded as from each a dataset. This measures the alignment between cells specified by the two arguments. cellComp can contain cells already specified in cellIdx.

Examples

Run this code

if (requireNamespace("RcppPlanc", quietly = TRUE)) {
    pbmc <- pbmc %>%
    normalize %>%
    selectGenes %>%
    scaleNotCenter %>%
    runINMF %>%
    quantileNorm
    calcAlignment(pbmc)
}

Run the code above in your browser using DataLab