top_markers: Identify the genes most specifically expressed in groups of cells

Description

Identify the genes most specifically expressed in groups of cells

Usage

top_markers(
  cds,
  group_cells_by = "cluster",
  genes_to_test_per_group = 25,
  reduction_method = "UMAP",
  marker_sig_test = TRUE,
  reference_cells = NULL,
  speedglm.maxiter = 25,
  cores = 1,
  verbose = FALSE
)

Arguments

cds

A cell_data_set object to calculate top markers for.

group_cells_by

String indicating what to group cells by for comparison. Default is "cluster".

genes_to_test_per_group

Numeric, how many genes of the top ranked specific genes by Jenson-Shannon to do the more expensive regression test on.

reduction_method

String indicating the method used for dimensionality reduction. Currently only "UMAP" is supported.

marker_sig_test

A flag indicating whether to assess the discriminative power of each marker through logistic regression. Can be slow, consider disabling to speed up top_markers().

reference_cells

If provided, top_markers will perform the marker significance test against a "reference set" of cells. Must be either a list of cell ids from colnames(cds), or a positive integer. If the latter, top_markers() will randomly select the specified number of reference cells. Accelerates the marker significance test at some cost in sensitivity.

speedglm.maxiter

Maximum number of iterations allowed for fitting GLM models when testing markers for cell group.

cores

Number of cores to use.

verbose

Whether to print verbose progress output.

Value

a data.frame where the rows are genes and the columns are

gene_id vector of gene names
gene_short_name vector of gene short names
cell_group character vector of the cell group to which the cell belongs
marker_score numeric vector of marker scores as the fraction expressing scaled by the specificity. The value ranges from 0 to 1.
mean_expression numeric vector of mean normalized expression of the gene in the cell group
fraction_expressing numeric vector of fraction of cells expressing the gene within the cell group
specificity numeric vector of a measure of how specific the gene's expression is to the cell group based on the Jensen-Shannon divergence. The value ranges from 0 to 1.
pseudo_R2 numeric vector of pseudo R-squared values, a measure of how well the gene expression model fits the categorical data relative to the null model. The value ranges from 0 to 1.
marker_test_p_value numeric vector of likelihood ratio p-values
marker_test_q_value numeric vector of likelihood ratio q-values