estimateExpression: Estimate expression of transcripts

Description

Estimates the expression of transcripts using Markov chain Monte Carlo Algorithm

Usage

estimateExpression(probFile, outFile, parFile=NULL, outputType=NULL, gibbs=NULL,  trInfoFile=NULL, thetaActFile=NULL, MCMC_burnIn=NULL, MCMC_samplesN=NULL,  MCMC_samplesSave=NULL, MCMC_chainsN=NULL, MCMC_dirAlpha=NULL, seed=NULL,  verbose=NULL, procN=NULL, pretend=FALSE)
estimateExpressionLegacy(probFile, outFile, parFile=NULL, outputType=NULL, gibbs=NULL, trInfoFile=NULL, thetaActFile=NULL, MCMC_burnIn=NULL, MCMC_samplesN=NULL, MCMC_samplesSave=NULL, MCMC_samplesNmax=NULL, MCMC_chainsN=NULL, MCMC_scaleReduction=NULL, MCMC_dirAlpha=NULL, seed=NULL, verbose=NULL, pretend=FALSE)

Arguments

probFile

File with alignment probabilities produced by parseAlignment

outFile

Prefix for the output files.

outputType

Output type, possible values: theta, RPKM, counts, tau.

gibbs

Use regular Gibbs sampling instead of Collapsed Gibbs sampling.

parFile

File containing parameters for the sampler, which can be otherwise specified by [MCMC*] options. As the file is checked after every MCMC iteration, the parameters can be adjusted while running.

trInfoFile

File containing transcript information. (Necessary for RPKM)

MCMC_burnIn

Length of sampler's burn in period.

MCMC_samplesN

Initial number of samples produced. These are used either to estimate the number of necessary samples or to estimate possible scale reduction.

MCMC_samplesSave

Number of samples recorder at the end in total.

MCMC_chainsN

Number of parallel chains used. At least two chains will be used.

seed

Sets the initial random seed for repeatable experiments.

verbose

Verbose output.

procN

Maximum number of threads to be used. The program will not use more threads that there are MCMC chains.

thetaActFile

File for logging noise parameter thetaAct, which is only generated when regular Gibbs sampling is used.

MCMC_dirAlpha

Alpha parameter for the Dirichlet distribution.

pretend

Do not execute, only print out command line calls for the C++ version of the program.

MCMC_scaleReduction

(Only for estimateExpressionLegacy.) Target scale reduction, sampler finishes after this value is met.

MCMC_samplesNmax

(Only for estimateExpressionLegacy.) Maximum number of samples produced in one iteration. After producing samplesNmax samples sampler finishes.

Value

.thetaMeans: file containing average relative expression of transcripts $theta$
.rpkm: for RPKM expression
.counts: for estimated read counts
.theta: for relative expression of fragments
.tau: for relative expression of transcripts

Details

This function runs Collapse Gibbs algorithm to sample the MCMC samples of transcript expression. The input is the .prob file containing alignment probabilities which were produced by parseAlignment. Other optional input is the transcript information file specified by trInfoFile and again produced by parseAlignment.

The estimateExpression function first runs burn-in phase and initial iterations to estimate the properties of the MCMC sampling. The initial samples are used to estimate the number of samples necessary for generating MCMC_samplesSave effective samples in the second, final, stage.

The estimateExpressionLegacy uses less efficient convergence checking via "scale reduction" estimation. After an iteration of generating MCMC_samplesN samples, it estimates possible scale reduction of the marginal posterior variance. While the possible scale reduction is high, it doubles the MCMC_samplesN and starts new iteration. This process is repeated until desired value of MCMC_scaleReduction is met, or MCMC_samplesNmax samples are generated.

The sampling algorithm can be configured via parameters file parFile or by using the MCMC* options. The advantage of using the file (at least an existing blank text document) is that by changing the configuration values while running, the new values do get updated after every iteration.

Description

Usage

Arguments

Value

Details

See Also