rule: Add a cell type rule.

Description

This is the heart of cellpypes and best used by piping from one rule into the next with magrittr::%>%. Check out examples at gitHub)!

Usage

rule(
  obj,
  class,
  feature,
  operator = ">",
  threshold,
  parent = NULL,
  use_CP10K = TRUE
)

Value

obj is returned, but with the rule and class stored in obj$rules and obj$classes, to be used by classify.

Arguments

obj: A cellpypes object, see section cellpypes Objects below.
class: Character scalar with the class name. Typically, cellpypes classes are literature cell types ("T cell") or any subpopulation of interest ("CD3E+TNF+LAG3-").
feature: Character scalar naming the gene you'd like to threshold. Must be a row name in obj$raw.
operator: One of c(">","<"). Use ">" for positive (CD3E+) and "<" for negative markers (MS4A1-).
threshold: Numeric scalar with the expression threshold separating positive from negative cells. Experiment with this value, until expression and selected cells agree well in UMAP (see examples on gitHub).
parent: Character scalar with the parent class (e.g. "T cell" for "Cytotoxic T cells"). Only has to be specified once per class (else most recent one is taken), and defaults to "..root.." if NULL is passed in all rules.
use_CP10K: If TRUE, threshold is taken to be counts per 10 thousand UMI counts, a measure for RNA molecule fractions. We recommend CP10K for human intuition (1 CP10K is roughly 1 UMI per cell), but the results are the exact same whether you use threshold=1,CP10K=TRUE or threshold=1e-4,CP10K=FALSE.

cellpypes Objects

A cellpypes object is a list with four slots:

raw

(sparse) matrix with genes in rows, cells in columns

totalUMI

the colSums of obj$raw

embed

two-dimensional embedding of the cells, provided as data.frame or tibble with two columns and one row per cell.

neighbors

index matrix with one row per cell and k columns, where k is the number of nearest neighbors (we recommend 15<k<100, e.g. k=50). Here are two ways to get the neighbors index matrix:

Use find_knn(featureMatrix)$idx, where featureMatrix could be principal components, latent variables or normalized genes (features in rows, cells in columns).
use as(seurat@graphs[["RNA_nn"]], "dgCMatrix")> .1 to extract the kNN graph computed on RNA. The > .1 ensures this also works with RNA_snn, wknn/wsnn or any other available graph – check with names(seurat@graphs).

Details

Calling rule is computationally cheap because it only stores the cell type rule while all computations happen in classify. If you have classes with multiple rules, the most recent parent and feature-threshold combination counts. It is ok to mix rules with and without use_CP10K=TRUE.

Examples

Run this code

# T cells are CD3E+:
obj <- rule(simulated_umis, "T", "CD3E", ">", .1)
# T cells are MS4A1-:
obj <- rule(obj, "T", "MS4A1", "<", 1)
# Tregs are a subset of T cells:
obj <- rule(obj, "Treg", "FOXP3", ">", .1, parent="T")

Run the code above in your browser using DataLab