Learn R Programming

TitanCNA (version 1.10.0)

runEMclonalCN: Function to run the Expectation Maximization Algorithm in TitanCNA.

Description

Function to run the Expectation Maximization Algorithm for inference of model parameters: cellular prevalence, normal proportion, tumour ploidy. This is a key function in the TitanCNA package and is the most computationally intense. This function makes calls to a C subroutine that allows the algorithm to be run more efficiently.

Usage

runEMclonalCN(data, gParams, nParams, pParams, sParams, txnExpLen = 1e15, txnZstrength = 5e05, maxiter = 15, maxiterUpdate = 1500, pseudoCounts = 1e-300, normalEstimateMethod = "map", estimateS = TRUE, estimatePloidy = TRUE, useOutlierState = FALSE, verbose = TRUE)

Arguments

data
list object that contains the components for the data to be analyzed. chr, posn, ref, and tumDepth that can be obtained using loadAlleleCounts, and logR that can be obtained using correctReadDepth and getPositionOverlap (see Example).
gParams
list object that contains the copy number and allelic ratio genotype parameters. Can be obtained from loadDefaultParameters.
nParams
list object that contains the normal contamination parameters. Can be obtained from loadDefaultParameters.
pParams
list object that contains the tumour ploidy parameters. Can be obtained from loadDefaultParameters.
sParams
list object that contains the subclonality (cellular prevalence and clonal cluster) parameters. Can be obtained from loadDefaultParameters.
txnExpLen
Influences prior probability of genotype transitions in the HMM. Smaller value have lower tendency to change state; however, too small and it produces underflow problems. 1e-9 works well for up to 3 million total positions.
txnZstrength
Influences prior probability of clonal cluster transitions in the HMM. Smaller value have lower tendency to change clonal cluster state. 1e-9 works well for up to 3 million total positions.
pseudoCounts
Small, machine precision values to add to probabilities to avoid underflow. For example, .Machine$double.eps.
maxiter
Maximum number of expectation-maximization iterations allowed. In practice, for TitanCNA, it will usually not exceed 20.
maxiterUpdate
Maximum number of coordinate descent iterations during the M-step (of EM algorithm) when parameters are estimated.
normalEstimateMethod
Specifies how to handle normal proportion estimation. Using map will use the maximum a posteriori estimation. Using fixed will not estimate the normal proportion; the normal proportion will be fixed to whatever is specified in params$normalParams$n_0. See Details.
estimateS
Logical indicating whether to account for clonality and estimate subclonal events. See Details.
estimatePloidy
Logical indicating whether to estimate and account for tumour ploidy.
useOutlierState
Logical indicating whether an additional outlier state should be used. In practice, this is usually not necessary.
verbose
Set to FALSE to suppress program messages.

Value

list with components for results returned from the EM algorithm, including converged parameters, posterior marginal responsibilities, log likelihood, and original parameter settings.
n
Converged estimate for normal contamination parameter. numeric array containing estimates at each EM iteration.
s
Converged estimate(s) for cellular prevalence parameter(s). This value is defined as the proportion of tumour sample that does not contain the aberrant genotype. This will contrast what is output in outputTitanResults. numeric array containing estimates at each EM iteration. If more than one cluster is specified, then s is a numeric matrix.
var
Converged estimates for variance parameter of the Gaussian mixtures used to model the log ratio data. numeric matrix containing estimates at each EM iteration.
phi
Converged estimate for tumour ploidy parameter. numeric array containing estimates at each EM iteration.
piG
Converged estimate for initial genotype state distribution. numeric matrix containing estimates at each EM iteration.
piZ
Converged estimate for initial clonal cluster state distribution. numeric matrix containing estimates at each EM iteration.
muR
Mean of binomial mixtures computed as a function of s and n. numeric matrix containing estimates at each EM iteration. See References for mathematical details.
muC
Mean of Gaussian mixtures computed as a function of s, n, and phi. numeric matrix containing estimates at each EM iteration. See References for mathematical details.
loglik
Posterior Log-likelihood that includes data likelihood and the priors. numeric array containing estimates at each EM iteration.
rhoG
Posterior marginal probabilities for the genotype states computed during the E-step. Only the final iteration is returned as a numeric matrix.
rhoZ
Posterior marginal probabilities for the clonal cluster states computed during the E-step. Only the final iteration is returned as a numeric matrix.
genotypeParams
Original genotype parameters. See loadDefaultParameters.
ploidyParams
Original tumour ploidy parameters. See loadDefaultParameters.
normalParams
Original normal contamination parameters. See loadDefaultParameters.
clonalParams
Original subclonal parameters. See loadDefaultParameters.
txnExpLen
Original genotype transition expected length. See loadDefaultParameters.
txnZstrength
Original clonal cluster transition expected length. See loadDefaultParameters.
useOutlierState
Original setting indicating usage of outlier state. See loadDefaultParameters.

Details

This function is implemented with the "foreach" package and therefore supports parallelization. See "doMC" or "doMPI" for some parallelization packages. The forwards-backwards algorithm is used for the E-step in the EM algorithm. This is done using a call to a C subroutine for each chromosome. The maximization step uses maximum a posteriori (MAP) for estimation of parameters.

If the sample has absolutely no normal contamination, then assign nParams$n_0 <- 0 and use argument normalEstimateMethod="fixed". estimateS should always be set to TRUE. If no subclonality is expected, then use loadDefaultParameters(numberClonalClusters=1). Using estimateS=FALSE and loadDefaultParameters(numberClonalClusters=0) is gives more or less the same results.

References

Ha, G., Roth, A., Khattra, J., Ho, J., Yap, D., Prentice, L. M., Melnyk, N., McPherson, A., Bashashati, A., Laks, E., Biele, J., Ding, J., Le, A., Rosner, J., Shumansky, K., Marra, M. A., Huntsman, D. G., McAlpine, J. N., Aparicio, S. A. J. R., and Shah, S. P. (2014). TITAN: Inference of copy number architectures in clonal cell populations from tumour whole genome sequence data. Genome Research, 24: 1881-1893. (PMID: 25060187)

See Also

"foreach", "doMC", "doMPI", loadAlleleCounts, loadDefaultParameters, viterbiClonalCN

Examples

Run this code
message('Running TITAN ...')
#### LOAD DATA ####
infile <- system.file("extdata", "test_alleleCounts_chr2.txt", 
              package = "TitanCNA")
data <- loadAlleleCounts(infile)

#### LOAD PARAMETERS ####
message('titan: Loading default parameters')
numClusters <- 2
params <- loadDefaultParameters(copyNumber = 5, 
              numberClonalClusters = numClusters, skew = 0.1)

#### READ COPY NUMBER FROM HMMCOPY FILE ####
message('titan: Correcting GC content and mappability biases...')
tumWig <- system.file("extdata", "test_tum_chr2.wig", package = "TitanCNA")
normWig <- system.file("extdata", "test_norm_chr2.wig", package = "TitanCNA")
gc <- system.file("extdata", "gc_chr2.wig", package = "TitanCNA")
map <- system.file("extdata", "map_chr2.wig", package = "TitanCNA")
cnData <- correctReadDepth(tumWig, normWig, gc, map)
logR <- getPositionOverlap(data$chr, data$posn, cnData)
data$logR <- log(2^logR) #transform to natural log

#### FILTER DATA FOR DEPTH, MAPPABILITY, NA, etc ####
data <- filterData(data, 1:24, minDepth = 10, maxDepth = 200, map = NULL)

#### EM (FWD-BACK) TO TRAIN PARAMETERS ####
#### Can use parallelization packages ####
K <- length(params$genotypeParams$alphaKHyper)
params$genotypeParams$alphaKHyper <- rep(500, K)
params$ploidyParams$phi_0 <- 1.5
convergeParams <- runEMclonalCN(data, gParams = params$genotypeParams, 
                                nParams = params$normalParams, 
                                pParams = params$ploidyParams, 
                                sParams = params$cellPrevParams, 
                                maxiter = 3, maxiterUpdate = 500, 
                                txnExpLen = 1e15, txnZstrength = 5e5, 
                                useOutlierState = FALSE, 
                                normalEstimateMethod = "map", 
                                estimateS = TRUE, estimatePloidy = TRUE)

Run the code above in your browser using DataLab