Learn R Programming

BitSeq (version 1.16.0)

estimateExpression: Estimate expression of transcripts

Description

Estimates the expression of transcripts using Markov chain Monte Carlo Algorithm

Usage

estimateExpression(probFile, outFile, parFile=NULL, outputType=NULL, gibbs=NULL, trInfoFile=NULL, thetaActFile=NULL, MCMC_burnIn=NULL, MCMC_samplesN=NULL, MCMC_samplesSave=NULL, MCMC_chainsN=NULL, MCMC_dirAlpha=NULL, seed=NULL, verbose=NULL, procN=NULL, pretend=FALSE) estimateExpressionLegacy(probFile, outFile, parFile=NULL, outputType=NULL, gibbs=NULL, trInfoFile=NULL, thetaActFile=NULL, MCMC_burnIn=NULL, MCMC_samplesN=NULL, MCMC_samplesSave=NULL, MCMC_samplesNmax=NULL, MCMC_chainsN=NULL, MCMC_scaleReduction=NULL, MCMC_dirAlpha=NULL, seed=NULL, verbose=NULL, pretend=FALSE)

Arguments

probFile
File with alignment probabilities produced by parseAlignment
outFile
Prefix for the output files.
outputType
Output type, possible values: theta, RPKM, counts, tau.
gibbs
Use regular Gibbs sampling instead of Collapsed Gibbs sampling.
parFile
File containing parameters for the sampler, which can be otherwise specified by [MCMC*] options. As the file is checked after every MCMC iteration, the parameters can be adjusted while running.
trInfoFile
File containing transcript information. (Necessary for RPKM)
MCMC_burnIn
Length of sampler's burn in period.
MCMC_samplesN
Initial number of samples produced. These are used either to estimate the number of necessary samples or to estimate possible scale reduction.
MCMC_samplesSave
Number of samples recorder at the end in total.
MCMC_chainsN
Number of parallel chains used. At least two chains will be used.
seed
Sets the initial random seed for repeatable experiments.
verbose
Verbose output.
procN
Maximum number of threads to be used. The program will not use more threads that there are MCMC chains.
thetaActFile
File for logging noise parameter thetaAct, which is only generated when regular Gibbs sampling is used.
MCMC_dirAlpha
Alpha parameter for the Dirichlet distribution.
pretend
Do not execute, only print out command line calls for the C++ version of the program.
MCMC_scaleReduction
(Only for estimateExpressionLegacy.) Target scale reduction, sampler finishes after this value is met.
MCMC_samplesNmax
(Only for estimateExpressionLegacy.) Maximum number of samples produced in one iteration. After producing samplesNmax samples sampler finishes.

Value

.thetaMeans
file containing average relative expression of transcripts $theta$
Either one of sample files based on output type selected:
.rpkm
for RPKM expression
.counts
for estimated read counts
.theta
for relative expression of fragments
.tau
for relative expression of transcripts

Details

This function runs Collapse Gibbs algorithm to sample the MCMC samples of transcript expression. The input is the .prob file containing alignment probabilities which were produced by parseAlignment. Other optional input is the transcript information file specified by trInfoFile and again produced by parseAlignment.

The estimateExpression function first runs burn-in phase and initial iterations to estimate the properties of the MCMC sampling. The initial samples are used to estimate the number of samples necessary for generating MCMC_samplesSave effective samples in the second, final, stage.

The estimateExpressionLegacy uses less efficient convergence checking via "scale reduction" estimation. After an iteration of generating MCMC_samplesN samples, it estimates possible scale reduction of the marginal posterior variance. While the possible scale reduction is high, it doubles the MCMC_samplesN and starts new iteration. This process is repeated until desired value of MCMC_scaleReduction is met, or MCMC_samplesNmax samples are generated.

The sampling algorithm can be configured via parameters file parFile or by using the MCMC* options. The advantage of using the file (at least an existing blank text document) is that by changing the configuration values while running, the new values do get updated after every iteration.

See Also

parseAlignment