Learn R Programming

HaHMMR

Haplotype-aware Hidden Markov Model for RNA (HaHMMR) is a method for detecting CNVs from bulk RNA-seq data. Extending the haplotype-aware HMM in Numbat for single-cell RNA-seq, HaHMMR offers enhanced capabilities for detecting low-clonality CNVs from bulk data.

Installation

Install the latest GitHub version using devtools:

devtools::install_github("https://github.com/kharchenkolab/hahmmr")

Usage

Preparing data

First, obtain expression counts and phased allele counts from the RNA-seq sample. The expression counts can be prepared using a transcript quantification tool such as Salmon. The phased allele counts can be prepared using the pileup_and_phase.R pipeline from Numbat. A Docker container is available for running this pipeline.

For example, within the Numbat Docker you can run pileup_and_phase in bulk RNA-seq mode like this:

Rscript /numbat/inst/bin/pileup_and_phase.R \
    --bulk \
    --label {sample} \
    --samples {sample} \
    --bams /mnt/mydata/{sample}.bam \
    --outdir /mnt/mydata/{sample} \
    --gmap /Eagle_v2.4.1/tables/genetic_map_hg38_withX.txt.gz \
    --snpvcf /data/genome1K.phase3.SNP_AF5e2.chr1toX.hg38.vcf \
    --paneldir /data/1000G_hg38 \
    --ncores ncores

The integer expression counts (count_mat) should be a one-column matrix where rownames are genes and colname is the sample name. The phased allele counts (df_allele) should be a dataframe containing columns snp_id, CHROM, POS, cM (genetic distance in centimorgan), REF, ALT, AD (ALT allele count), DP (total allele count), GT (phased genotype), gene.

Running HaHMMR

Here is an example using the RNA-seq samples from a Meningioma study.

library(dplyr)
library(hahmmr)
allele_counts = data.table::fread('http://pklab.med.harvard.edu/teng/data/hmm_example/MN-5_TUMOR_allele_counts.tsv.gz')
gene_counts = readRDS(url('http://pklab.med.harvard.edu/teng/data/hmm_example/MN_gene_counts.rds'))

Sample MN-1037 has a diploid genome so we can use it to create a reference expression profile.

ref_internal = gene_counts[,'MN-1037_TUMOR',drop=F] %>% {./sum(.)}
head(ref_internal)
##          MN-1037_TUMOR
## 7SK       0.000000e+00
## A1BG      1.107976e-06
## A1BG-AS1  5.003764e-07
## A1CF      3.574117e-08
## A2ML1     3.931529e-07
## A4GALT    9.314150e-05

We can now analyze it using HaHMMR.

sample = 'MN-5_TUMOR'

bulk = get_bulk(
        count_mat = gene_counts[,sample,drop=F],
        df_allele = allele_counts,
        lambdas_ref = ref_internal,
        gtf = gtf_hg38
    ) %>% 
    analyze_joint()

bulk %>% plot_psbulk(min_depth = 15)

Copy Link

Version

Install

install.packages('hahmmr')

Monthly Downloads

320

Version

1.0.0

License

MIT + file LICENSE

Maintainer

Teng Gao

Last Published

October 25th, 2023

Functions in hahmmr (1.0.0)

get_exp_bulk

Aggregate into bulk expression profile
chrom_sizes_hg19

chromosome sizes (hg19)
get_trans_probs_s15

Helper function to calculate transition porbabilities for 15-state joint HMM cn/phase are sclars, only p_s is vectorized
plot_bulks

Plot a group of pseudobulk HMM profiles
plot_psbulk

Plot a pseudobulk HMM profile
likelihood_allele

Only compute total log likelihood from an allele HMM
run_joint_hmm_s15

Run 15-state joint HMM on a pseudobulk profile
run_allele_hmm_s5

Run a 5-state allele-only HMM - two theta levels
logSumExp

logSumExp function
combine_bulk

Combine allele and expression pseudobulks
theta_hat_seg

Estimate of imbalance level theta in a segment
gaps_hg38

genome gap regions (hg38)
switch_prob_cm

predict phase switch probablity as a function of genetic distance
classify_alleles

classify alleles using viterbi and forward-backward
gaps_hg19

genome gap regions (hg19)
vcf_meta

example VCF header
smooth_segs

Smooth the segments after HMM decoding
viterbi_joint_mat

Generalized viterbi algorithm for joint HMM, can handle multiple sets of observations
chrom_sizes_hg38

chromosome sizes (hg38)
calc_trans_mat_s7

Calculate the transition matrix for 7-state joint HMM
fit_gamma

fit gamma maximum likelihood
fit_ref_sse

Fit a reference profile from multiple references using constrained least square
calc_trans_mat_s15

Calculate the transition matrix for 15-state joint HMM
forward_back_allele

Forward-backward algorithm for allele HMM
fit_lnpois_cpp

Fit MLE of log-normal Poisson model
get_allele_hmm_s3

Make a 3-state allele HMM - allow transitions to netural state
viterbi_allele

Viterbi algorithm for allele HMM
df_allele_example

example allele count dataframe
viterbi_joint

Viterbi algorithm for joint HMM, can only handle one set of observations
l_bbinom

calculate joint likelihood of allele data
run_joint_hmm_s7

Run 7-state joint HMM on a pseudobulk profile
l_lnpois

calculate joint likelihood of a PLN model
get_bulk

Produce combined bulk expression and allele profile
segs_example

example CNV segments dataframe
get_allele_bulk

Aggregate into pseudobulk alelle profile
gtf_hg38

gene model (hg38)
generate_postfix

Generate alphabetical postfixes
get_trans_probs_s7

Helper function to calculate transition porbabilities for 7-state joint HMM cn/phase are sclars, only p_s is vectorized
gtf_mm10

gene model (mm10)
gene_counts_example

example gene expression counts matrix
get_allele_hmm_s2

Make a 2-state allele HMM - no transitions to netural state
gtf_hg19

gene model (hg19)
ref_hca_counts

reference expression counts from HCA
pre_likelihood_hmm

HMM object for unit tests
run_allele_hmm_s3

Run a 3-state allele HMM
ref_hca

reference expression magnitudes from HCA
approx_theta_post_s2

Find optimal theta of a CNV segment using forward-backward; uses a 2-state allele HMM
analyze_allele

Analyze allele profile
analyze_joint

Analyze allele and expression profile
dpoilog

Returns the density for the Poisson lognormal distribution with parameters mu and sig
bulk_example

example pseudobulk dataframe
acen_hg19

centromere regions (hg19)
acen_hg38

centromere regions (hg38)
annot_segs

Annotate copy number segments after HMM decoding
approx_theta_post_s3

Find optimal theta of a chromosome using forward-backward; uses a 3-state allele HMM
calc_exp_LLR

Calculate LLR for an expression HMM
calc_allele_LLR

Calculate LLR for an allele HMM
approx_phi_post

Laplace approximation of the posterior of expression fold change phi
annot_cm

Annotate genetic distance between markers
calc_allele_lik_s3

Calculate allele likelihoods for 3-state allele HMM
calc_allele_lik_s2

Calculate allele likelihoods for 2-state allele HMM
filter_genes

filter for mutually expressed genes
check_matrix

Check the format of a count matrix
check_cols

Check if columns are present in dataframe
dbbinom

Beta-binomial distribution density function A distribution is beta-binomial if p, the probability of success, in a binomial distribution has a beta distribution with shape parameters alpha > 0 and beta > 0 For more details, see extraDistr::dbbinom