Learn R Programming

DUBStepR

DUBStepR (Determining the Underlying Basis using Step-wise Regression) is a feature selection algorithm for cell type identification in single-cell RNA-sequencing data.

{ width=50% }

Feature selection, i.e. determining the optimal subset of genes to cluster cells into cell types, is a critical step in the unsupervised clustering of scRNA-seq data.

DUBStepR is based on the intuition that cell-type-specific marker genes tend to be well correlated with each other, i.e. they typically have strong positive and negative correlations with other marker genes. After filtering genes based on a correlation range score, DUBStepR exploits structure in the gene-gene correlation matrix to prioritize genes as features for clustering. A vignette for using DUBStepR on scRNA-seq data can be accessed using browseVignettes("DUBStepR").

The DUBStepR methodology is described in the workflow below, and in our latest preprint on bioRxiv:

{ width=50% }

Copy Link

Version

Install

install.packages('DUBStepR')

Monthly Downloads

73

Version

1.2.0

License

MIT + file LICENSE

Maintainer

Bobby Ranjan

Last Published

October 5th, 2021

Functions in DUBStepR (1.2.0)

pbmc_norm_small_data

Small PBMC dataset
logNormalize

Log-transform and normalize data by sequencing depth
runStepwiseReg

Run step-wise regression to order the features
findElbow

Find the Elbow in a Curve
DUBStepR

DUBStepR - Obtain a list of feature genes to characterise cell types
getCorrelationRange

Compute the correlation range values for all genes in the gene-gene correlation matrix.
getOptimalFeatureSet

Determine the optimal feature set using Density Index (DI)
getFilteredData

Filter the dataset by removing lowly expressed genes and mitochondrial, spike-in and ribosomal genes
getGGC

Compute the correlation range values for all genes in the gene-gene correlation matrix