Learn R Programming

Numbat

Numbat is a haplotype-aware CNV caller from single-cell and spatial transcriptomics data. It integrates signals from gene expression, allelic ratio, and population-derived haplotype information to accurately infer allele-specific CNVs in single cells and reconstruct their lineage relationship.

Numbat can be used to:

  1. Detect allele-specific copy number variations from scRNA-seq and spatial transcriptomics
  2. Differentiate tumor versus normal cells in the tumor microenvironment
  3. Infer the clonal architecture and evolutionary history of profiled tumors.

Numbat does not require paired DNA or genotype data and operates solely on the donor scRNA-seq data (for example, 10x Cell Ranger output). For details of the method, please checkout our paper:

Teng Gao, Ruslan Soldatov, Hirak Sarkar, Adam Kurkiewicz, Evan Biederstedt, Po-Ru Loh, Peter Kharchenko. Haplotype-aware analysis of somatic copy number variations from single-cell transcriptomes. Nature Biotechnology (2022).

User Guide

For a complete guide, please see Numbat User Guide.

Questions?

We appreciate your feedback! Please raise a github issue for bugs, questions and new feature requests. For bug reports, please attach full log, error message, input parameters, and ideally a reproducible example (if possible).

Copy Link

Version

Install

install.packages('numbat')

Monthly Downloads

353

Version

1.4.2

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Teng Gao

Last Published

September 20th, 2024

Functions in numbat (1.4.2)

approx_theta_post

Laplace approximation of the posterior of allelic imbalance theta
contract_nodes

Merge adjacent set of nodes
annotate_genes

Annotate genes on allele dataframe
detect_clonal_loh

Call clonal LOH using SNP density. Rcommended for cell lines or tumor samples with no normal cells.
get_bulk

Aggregate single-cell data into combined bulk expression and allele profile
count_mat_ref

example reference count matrix
check_exp_ref

check the format of lambdas_ref
check_exp_noise

check noise level
fill_neu_segs

Fill neutral regions into consensus segments
expand_states

expand multi-allelic CNVs into separate entries in the single-cell posterior dataframe
approx_phi_post

Laplace approximation of the posterior of expression fold change phi
get_joint_post

get joint posteriors
get_lambdas_bar

Get average reference expressio profile based on single-cell ref choices
calc_exp_LLR

Calculate LLR for an expression HMM
gaps_hg19

genome gap regions (hg19)
fit_gamma

fit gamma maximum likelihood
binary_entropy

calculate entropy for a binary variable
check_contam

check inter-individual contamination
fit_bbinom

fit a Beta-Binomial model by maximum likelihood
count_mat_example

example gene expression count matrix
fit_snp_rate

negative binomial model
get_snps

process VCFs into SNP dataframe
get_tree_post

Find maximum lilkelihood assignment of mutations on a tree
get_clone_post

Map cells to the phylogeny (or genotypes) based on CNV posteriors
combine_bulk

Combine allele and expression pseudobulks
chrom_sizes_hg19

chromosome sizes (hg19)
filter_genes

filter for mutually expressed genes
calc_phi_mle_lnpois

Calculate the MLE of expression fold change phi
find_common_diploid

Find the common diploid region in a group of pseudobulks
get_exp_bulk

Aggregate into bulk expression profile
get_exp_likelihoods

get the single cell expression likelihoods
get_gtree

Get a tidygraph tree with simplified mutational history.
gtf_mm10

gene model (mm10)
fit_lnpois

fit a PLN model by maximum likelihood
make_group_bulks

Make a group of pseudobulks
genotype

Genotyping main function
chrom_sizes_hg38

chromosome sizes (hg38)
gtf_hg38

gene model (hg38)
check_segs_loh

Check the format of a given clonal LOH segment dataframe
gtf_hg19

gene model (hg19)
mark_tumor_lineage

Mark the tumor lineage of a phylogeny
get_haplotype_post

Get phased haplotypes
fit_ref_sse

Fit a reference profile from multiple references using constrained least square
get_allele_bulk

Aggregate into pseudobulk alelle profile
plot_psbulk

Plot a pseudobulk HMM profile
plot_phylo_heatmap

Plot single-cell CNV calls along with the clonal phylogeny
gexp_roll_example

example smoothed gene expression dataframe
plot_bulks

Plot a group of pseudobulk HMM profiles
df_allele_example

example allele count dataframe
exp_hclust

Run smoothed expression-based hclust
compute_posterior

Do bayesian averaging to get posteriors
get_inter_cm

Helper function to get inter-SNP distance
get_segs_consensus

Extract consensus CNV segments
get_internal_nodes

Helper function to get the internal nodes of a dendrogram and the leafs in each subtree
label_genotype

Label the genotypes on a mutation graph
label_edges

Annotate the direct upstream or downstream mutations on the edges
get_segs_neu

get neutral segments from multiple pseudobulks
simes_p

Calculate simes' p
plot_consensus

Plot consensus CNVs
mut_graph_example

example mutation graph
phi_hat_roll

Rolling estimate of expression fold change phi
relevel_chrom

Relevel chromosome column
choose_ref_cor

choose beest reference for each cell based on correlation
classify_alleles

classify alleles using viterbi and forward-backward
get_move_cost

Get the cost of a mutation reassignment
simplify_history

Simplify the mutational history based on likelihood evidence
get_allele_hmm

Get an allele HMM
plot_sc_tree

Plot single-cell smoothed expression magnitude heatmap
phi_hat_seg

Estimate of expression fold change phi in a segment
viterbi_loh

Viterbi for clonal LOH detection
get_allele_post

get CNV allele posteriors
smooth_expression

filtering, normalization and capping
get_nodes_celltree

Get the internal nodes of a dendrogram and the leafs in each subtree
get_ordered_tips

Get ordered tips from a tree
smooth_segs

Smooth the segments after HMM decoding
pnorm.range.log

Get the total probability from a region of a normal pdf
preprocess_allele

Preprocess allele data
phylogeny_example

example single-cell phylogeny
pre_likelihood_hmm

HMM object for unit tests
get_move_opt

Get the least costly mutation reassignment
resolve_cnvs

Get unique CNVs from set of segments
switch_prob_cm

predict phase switch probablity as a function of genetic distance
cnv_heatmap

Plot CNV heatmap
run_group_hmms

Run multiple HMMs
theta_hat_seg

Estimate of imbalance level theta in a segment
return_missing_columns

Check the format of a given file
transfer_links

Annotate the direct upstream or downstream node on the edges
plot_exp_roll

Plot single-cell smoothed expression magnitude heatmap
plot_mut_history

Plot mutational history
log_mem

Log memory usage
gaps_hg38

genome gap regions (hg38)
log_message

Log a message
generate_postfix

Generate alphabetical postfixes
upgma

UPGMA and WPGMA clustering
hc_example

example hclust tree
t_test_pval

T-test wrapper, handles error for insufficient observations
get_exp_sc

get the single cell expression dataframe
vcf_meta

example VCF header
joint_post_example

example joint single-cell cnv posterior dataframe
get_exp_post

compute single-cell expression posteriors
retest_bulks

retest consensus segments on pseudobulks
retest_cnv

retest CNVs in a pseudobulk
run_numbat

Run workflow to decompose tumor subclones
segs_example

example CNV segments dataframe
ref_hca

reference expression magnitudes from HCA
test_multi_allelic

test for multi-allelic CNVs
theta_hat_roll

Rolling estimate of imbalance level theta
ref_hca_counts

reference expression counts from HCA
annot_ref

example reference cell annotation
aggregate_counts

Utility function to make reference gene expression profiles
analyze_bulk

Call CNVs in a pseudobulk profile using the Numbat joint HMM
annot_consensus

Annotate a consensus segments on a pseudobulk dataframe
acen_hg19

centromere regions (hg19)
annot_segs

Annotate copy number segments after HMM decoding
bulk_example

example pseudobulk dataframe
annot_haplo_segs

Annotate haplotype segments after HMM decoding
Modes

Get the modes of a vector
acen_hg38

centromere regions (hg38)
calc_allele_LLR

Calculate LLR for an allele HMM
calc_allele_lik

Calculate allele likelihoods
Numbat

Numbat R6 class
annot_theta_mle

Annotate the theta parameter for each segment
calc_cluster_dist

Calculate expression distance matrix between cell populatoins
check_allele_df

Check the format of a allele dataframe
check_matrix

Check the format of a count matrix
annot_theta_roll

Annotate rolling estimate of imbalance level theta
check_segs_fix

check the format of a given consensus segment dataframe