quickMarkers: Gets top N markers for each cluster

Description

Uses tf-idf ordering to get the top N markers of each cluster. For each cluster, either the top N or all genes passing the hypergeometric test with the FDR specified, whichever list is smallest.

Usage

quickMarkers(toc, clusters, N = 10, FDR = 0.01, expressCut = 0.9)

Value

data.frame with top N markers (or all that pass the hypergeometric test) and their statistics for each cluster.

Arguments

toc: Table of counts. Must be a sparse matrix.
clusters: Vector of length ncol(toc) giving cluster membership.
N: Number of marker genes to return per cluster.
FDR: False discover rate to use.
expressCut: Value above which a gene is considered expressed.

Details

Term Frequency - Inverse Document Frequency is used in natural language processing to identify terms specific to documents. This function uses the same idea to order genes within a group by how predictive of that group they are. The main advantage of this is that it is extremely fast and gives reasonable results.

To do this, gene expression is binarised in each cell so each cell is either considered to express or not each gene. That is, we replace the counts with toc > zeroCut. The frequency with which a gene is expressed within the target group is compared to the global frequency to calculate the tf-idf score. We also calculate a multiple hypothesis corrected p-value based on a hypergeometric test, but this is extremely permissive.

Examples

Run this code

#Calculate markers of clusters in toy data
mrks = quickMarkers(scToy$toc,scToy$metaData$clusters)
if (FALSE) {
#Calculate markers from Seurat (v3) object
mrks = quickMarkers(srat@assays$RNA@count,srat@active.ident)
}

Run the code above in your browser using DataLab