This is the heart of cellpypes and best used by piping from
one rule into the next
with magrittr::%>%
. Check out examples at
gitHub)!
rule(
obj,
class,
feature,
operator = ">",
threshold,
parent = NULL,
use_CP10K = TRUE
)
obj
is returned, but with the rule and class stored in
obj$rules
and obj$classes
, to be used by
classify.
A cellpypes object, see section cellpypes Objects below.
Character scalar with the class name. Typically, cellpypes classes are literature cell types ("T cell") or any subpopulation of interest ("CD3E+TNF+LAG3-").
Character scalar naming the gene you'd like to threshold.
Must be a row name in obj$raw
.
One of c(">","<")
. Use ">" for positive (CD3E+) and
"<" for negative markers (MS4A1-).
Numeric scalar with the expression threshold separating positive from negative cells. Experiment with this value, until expression and selected cells agree well in UMAP (see examples on gitHub).
Character scalar with the parent class (e.g. "T cell" for "Cytotoxic T cells"). Only has to be specified once per class (else most recent one is taken), and defaults to "..root.." if NULL is passed in all rules.
If TRUE, threshold
is taken to be
counts per 10 thousand UMI counts, a measure for RNA molecule fractions.
We recommend CP10K for human intuition (1 CP10K is roughly 1 UMI per cell),
but the results are the exact same whether you use
threshold=1,CP10K=TRUE
or
threshold=1e-4,CP10K=FALSE
.
A cellpypes object is a list with four slots:
raw
(sparse) matrix with genes in rows, cells in columns
totalUMI
the colSums of obj$raw
embed
two-dimensional embedding of the cells, provided as data.frame or tibble with two columns and one row per cell.
neighbors
index matrix with one row per cell and k columns, where k is the number of nearest neighbors (we recommend 15<k<100, e.g. k=50). Here are two ways to get the neighbors index matrix:
Use find_knn(featureMatrix)$idx
, where featureMatrix could be
principal components, latent variables or normalized genes (features in
rows, cells in columns).
use as(seurat@graphs[["RNA_nn"]], "dgCMatrix")> .1
to extract
the kNN
graph computed on RNA. The > .1
ensures this also works with RNA_snn,
wknn/wsnn or any other
available graph – check with names(seurat@graphs)
.
Calling rule
is computationally cheap because it only stores
the cell type rule while all computations
happen in classify.
If you have classes with multiple rules, the most recent parent
and
feature
-threshold
combination counts.
It is ok to mix rules with and without use_CP10K=TRUE
.
To have nicely formatted code in the end, copy the output of
pype_code_template()
to your script and start editing.
# T cells are CD3E+:
obj <- rule(simulated_umis, "T", "CD3E", ">", .1)
# T cells are MS4A1-:
obj <- rule(obj, "T", "MS4A1", "<", 1)
# Tregs are a subset of T cells:
obj <- rule(obj, "Treg", "FOXP3", ">", .1, parent="T")
Run the code above in your browser using DataLab