Call and visualize 'classify' function
plot_classes(
obj,
classes = NULL,
knn_refine = 0,
replace_overlap_with = "Unassigned",
return_logical_matrix = FALSE,
fast = NULL,
point_size = 0.4,
point_size_legend = 2,
base_size = 15,
overdispersion = 0.01
)
A ggplot2 object.
A cellpypes object, see section cellpypes Objects below.
Character vector with one or more class names. If NULL (the default), plots finest available cell types (all classes that are not parent of any other class).
Numeric between 0 and 1. If 0, do not refine labels obtained from UMI count pooling. If larger than 0 (recommended: 0.1), cellpypes will try to label unassigned cells by majority vote, see section knn_refine below.
Character string, by default: "Unassigned"
.
See section Handling overlap.
logical. If TRUE,
a logical matrix with
classes in columns and cells in rows is returned instead of resolving
overlaps with replace_overlap_with
.
If a single class is supplied, the matrix has exactly one
column and the user can pipe it into "drop" to convert it to a vector.
Set this to TRUE if you want fast plotting in spite of many cells
(using the scattermore package). If NULL (default), cellpypes decides
automatically and fast plotting is done for more than 10k cells, if FALSE
it always uses geom_point
.
Dot size used by geom_point.
Dot size displayed in legend. Legend colors are easier to read with larger points.
The base_size of theme_bw, i.e. how large text is displayed. Default: 15.
Defaults to 0.01, only change it if you know
what you are doing.
If set to 0, the NB simplifies to the Poisson distribution, and larger
values give more variance.
The 0.01 default value follows the recommendation by
Lause, Berens and Kobak (Genome Biology 2021) to use
size=100
in pnbinom for typical data sets.
A cellpypes object is a list with four slots:
raw
(sparse) matrix with genes in rows, cells in columns
totalUMI
the colSums of obj$raw
embed
two-dimensional embedding of the cells, provided as data.frame or tibble with two columns and one row per cell.
neighbors
index matrix with one row per cell and k columns, where k is the number of nearest neighbors (we recommend 15<k<100, e.g. k=50). Here are two ways to get the neighbors index matrix:
Use find_knn(featureMatrix)$idx
, where featureMatrix could be
principal components, latent variables or normalized genes (features in
rows, cells in columns).
use as(seurat@graphs[["RNA_nn"]], "dgCMatrix")> .1
to extract
the kNN
graph computed on RNA. The > .1
ensures this also works with RNA_snn,
wknn/wsnn or any other
available graph – check with names(seurat@graphs)
.
Overlap denotes all cells
for which rules from multiple classes apply, and these cells will be
labeled as Unassigned
by default.
If you are in fact interested in where the overlap is,
set return_logical_matrix
=TRUE and inspect the result.
Note that
it matters whether you call classify("Tcell")
or
classify(c("Tcell","Bcell")
– any existing overlap between T and B cells
is labelled as Unassigned
in
this second call, but not in the first.
Replacing overlap happens only between mutually exclusive labels (such as Tcell and Bcell), but not within a lineage. To make an example, overlap is NOT replaced between child (PD1+Ttox) and parent (Ttox) or any other ancestor (Tcell), but instead the most detailed cell type (PD1+Ttox) is returned.
All of the above is also true for plot_classes
, as it wraps classify
.
With knn_refine > 0
, cellpypes refines cell type labels with a kNN classifier.
By default, cellpypes only assigns cells to a class if all relevant rules apply. In other words, all marker gene UMI counts in the cell's neighborhood all have to be clearly above/below their threshold. Since UMI counts are sparse (even after neighbor pooling done by cellpypes), this can leave many cells unassigned.
It is reasonable to assume an unassigned cell is of the same cell type as the
majority of its nearest neighbors.
Therefore, cellpypes implements a kNN classifier to further refine labels
obtained by
manually thresholding UMI counts.
knn_refine = 0.3
means a cell is assigned the class label held by
most of its neighbors unless no class gets more than 30 %.
If most neighbors are unassigned, the cell will also be set to "Unassigned".
Choosing knn_refine = 0.3
gives results reminiscent of clustering
(which assigns all cells),
while knn_refine = 0.5
leaves cells 'in between' two similar
cell types unassigned.
We recommend looking at knn_refine = 0
first as it's faster and
more directly tied to marker gene expression.
If assigning all cells is desired, we recommend knn_refine = 0.3
or lower,
while knn_refine = 0.5
makes cell types more 'crisp' by setting cells
'in between' related subtypes to "Unassigned".
plot_classes(rule(simulated_umis, "T", "CD3E",">", 1))
Run the code above in your browser using DataLab