Learn R Programming

BASiCS (version 1.1.0)

BASiCS_MCMC: BASiCS MCMC sampler

Description

MCMC sampler to perform Bayesian inference for single-cell mRNA sequencing datasets using the model described in Vallejos et al (2015).

Usage

BASiCS_MCMC(Data, N, Thin, Burn, ...)

Arguments

Data

A '>SingleCellExperiment object. This MUST be formatted to include the spike-ins information (see vignette).

N

Total number of iterations for the MCMC sampler. Use N>=max(4,Thin), N being a multiple of Thin.

Thin

Thining period for the MCMC sampler. Use Thin>=2.

Burn

Burn-in period for the MCMC sampler. Use Burn>=1, Burn<N, Burn being a multiple of Thin.

...

Optional parameters.

PriorDelta

Specifies the prior used for delta. Possible values are 'gamma' (Gamma(a.theta,b.theta) prior) and 'log-normal' (log-Normal(0,s2.delta) prior) .

. Default value: PriorDelta = 'log-normal'.
PriorParam

List of 7 elements, containing the hyper-parameter values required for the adopted prior (see Vallejos et al, 2015, 2016). All elements must be positive real numbers.

s2.mu

Scale hyper-parameter for the log-Normal(0,s2.mu) prior that is shared by all gene-specific expression rate parameters \(\mu_i\). Default: s2.mu = 0.5.

s2.delta

Only used when `PriorDelta == 'log-normal'`. Scale hyper-parameter for the log-Normal(0,s2.delta) prior that is shared by all gene-specific over-dispersion parameters \(\delta_i\). Default: s2.delta = 0.5.

a.delta

Only used when `PriorDelta == 'gamma'`. Shape hyper-parameter for the Gamma(a.delta,b.delta) prior that is shared by all gene-specific biological over-dispersion parameters \(\delta_i\). Default: a.delta = 1.

b.delta

Only used when `PriorDelta == 'gamma'`. Rate hyper-parameter for the Gamma(a.delta,b.delta) prior that is shared by all gene-specific biological over-dispersion hyper-parameters \(\delta_i\). Default: b.delta = 1.

p.phi

Dirichlet hyper-parameter for the joint of all (scaled by n) cell-specific mRNA content normalising constants \(\phi_j / n\). Default: p.phi = rep(1, n).

a.s

Shape hyper-parameter for the Gamma(a.s,b.s) prior that is shared by all cell-specific capture efficiency normalising constants \(s_j\). Default: a.s = 1.

b.s

Rate hyper-parameter for the Gamma(a.s,b.s) prior that is shared by all cell-specific capture efficiency normalising constants \(s_j\). Default: b.s = 1.

a.theta

Shape hyper-parameter for the Gamma(a.theta,b.theta) prior for technical noise parameter \(\theta\). Default: a.theta = 1.

b.theta

Rate hyper-parameter for the Gamma(a.theta,b.theta) prior for technical noise parameter \(\theta\). Default: b.theta = 1.

AR

Optimal acceptance rate for adaptive Metropolis Hastings updates. It must be a positive number between 0 and 1. Default (and recommended): AR = 0.44

.

StopAdapt

Iteration at which adaptive proposals are not longer adapted. Use StopAdapt>=1. Default: StopAdapt = Burn.

StoreChains

If StoreChains = TRUE, the generated BASiCS_Chain object is stored as a `.Rds` file (RunName argument used to index the file name). Default: StoreChains = FALSE.

StoreAdapt

If StoreAdapt = TRUE, trajectory of adaptive proposal variances (in log-scale) for all parameters is stored as a list in a `.Rds` file (RunName argument used to index file name). Default: StoreAdapt = FALSE.

StoreDir

Directory where output files are stored. Only required if StoreChains = TRUE and/or StoreAdapt = TRUE). Default: StoreDir = getwd().

RunName

String used to index `.Rds` files storing chains and/or adaptive proposal variances.

PrintProgress

If PrintProgress = FALSE, console-based progress report is suppressed.

ls.phi0

Starting value for the adaptive concentration parameter of the Metropolis proposals for \(\phi = (\phi_1, \ldots, \phi_n)'\).

Start

Starting values for the MCMC sampler. We do not advise to specify this argument. Default options have been tuned to facilitate convergence. If changed, it must be a list containing the following elements: mu0, delta0, phi0, s0, nu0, theta0, ls.mu0, ls.delta0, ls.phi0, ls.nu0 and ls.theta0

Value

An object of class BASiCS_Chain.

References

Vallejos, Marioni and Richardson (2015). PLoS Computational Biology.

Vallejos, Richardson and Marioni (2016). Genome Biology.

Examples

Run this code
# NOT RUN {
# Built-in simulated dataset
Data = makeExampleBASiCS_Data()
# To analyse real data, please refer to the instructions in: 
# https://github.com/catavallejos/BASiCS/wiki/2.-Input-preparation

# Only a short run of the MCMC algorithm for illustration purposes
# Longer runs migth be required to reach convergence
Chain <- BASiCS_MCMC(Data, N = 50, Thin = 2, Burn = 10, 
                     PrintProgress = FALSE)

# For illustration purposes we load a built-in 'BASiCS_Chain' object 
# (obtained using the 'BASiCS_MCMC' function)
data(ChainSC)

# `displayChainBASiCS` can be used to extract information from this output. 
# For example:
head(displayChainBASiCS(ChainSC, Param = 'mu'))

# Traceplot (examples only)
plot(ChainSC, Param = 'mu', Gene = 1)
plot(ChainSC, Param = 'phi', Cell = 1)
plot(ChainSC, Param = 'theta', Batch = 1)

# Calculating posterior medians and 95% HPD intervals
ChainSummary <- Summary(ChainSC)

# `displaySummaryBASiCS` can be used to extract information from this output. 
# For example:
head(displaySummaryBASiCS(ChainSummary, Param = 'mu'))

# Graphical display of posterior medians and 95% HPD intervals 
# For example:
plot(ChainSummary, Param = 'mu', main = 'All genes')
plot(ChainSummary, Param = 'mu', Genes = 1:10, main = 'First 10 genes')
plot(ChainSummary, Param = 'phi', main = 'All cells')
plot(ChainSummary, Param = 'phi', Cells = 1:5, main = 'First 5 cells')
plot(ChainSummary, Param = 'theta')

# To constrast posterior medians of cell-specific parameters 
# For example:
par(mfrow = c(1,2))
plot(ChainSummary, Param = 'phi', Param2 = 's', SmoothPlot = FALSE)
# Recommended for large numbers of cells
plot(ChainSummary, Param = 'phi', Param2 = 's', SmoothPlot = TRUE) 

# To constrast posterior medians of gene-specific parameters
par(mfrow = c(1,2))
plot(ChainSummary, Param = 'mu', Param2 = 'delta', log = 'x', 
     SmoothPlot = FALSE)
# Recommended
plot(ChainSummary, Param = 'mu', Param2 = 'delta', log = 'x', 
     SmoothPlot = TRUE) 

# Highly and lowly variable genes detection (within a single group of cells)
DetectHVG <- BASiCS_DetectHVG(ChainSC, VarThreshold = 0.60, 
                              EFDR = 0.10, Plot = TRUE)
DetectLVG <- BASiCS_DetectLVG(ChainSC, VarThreshold = 0.40, 
                              EFDR = 0.10, Plot = TRUE)

plot(ChainSummary, Param = 'mu', Param2 = 'delta', log = 'x', col = 8)
with(DetectHVG$Table, points(Mu[HVG == TRUE], Delta[HVG == TRUE],
       pch = 16, col = 'red', cex = 1))
with(DetectLVG$Table, points(Mu[LVG == TRUE], Delta[LVG == TRUE],
       pch = 16, col = 'blue', cex = 1))

# If variance thresholds are not fixed
BASiCS_VarThresholdSearchHVG(ChainSC, 
                             VarThresholdsGrid = seq(0.55,0.65,by=0.01), 
                             EFDR = 0.10)
BASiCS_VarThresholdSearchLVG(ChainSC, 
                             VarThresholdsGrid = seq(0.35,0.45,by=0.01), 
                             EFDR = 0.10)
                             
# To obtain denoised rates / counts, see:
help(BASiCS_DenoisedRates)
help(BASiCS_DenoisedCounts)

# For examples of differential analyses between 2 populations of cells see:
help(BASiCS_TestDE)


# }

Run the code above in your browser using DataLab