PSSeg: Parent-Specific copy number segmentation

Description

This function splits (bivariate) copy number signals into parent-specific (PS) segments using recursive binary segmentation

Usage

PSSeg(data, method, stat = NULL, dropOutliers = TRUE,
  rankTransform = FALSE, ..., profile = FALSE, verbose = FALSE)

Value

A list with elements

bestBkp: Best set of breakpoints after dynamic programming
initBkp: Results of the initial segmentation, using 'doNnn', where 'Nnn' corresponds to argument method
dpBkpList: Results of dynamic programming, a list of vectors of breakpoint positions for the best model with k breakpoints for k=1, 2, ... K where K=length(initBkp)
prof: a matrix providing time usage (in seconds) and memory usage (in Mb) for the main steps of the program. Only defined if argument profile is set to TRUE

Arguments

data

Data frame containing the following columns:

c:: Total copy number (logged or non-logged)
b:: Allele B fraction
genotype:: (germline) genotype of the SNP, coded as 0 for AA, 1/2 for AB, 1 for BB

These data are assumed to be ordered by genome position.

method

"RBS": Recursive Binary Segmentation, see doRBS
"GFLars": Group fused LARS as described in Bleakley and Vert (2011).
"DP": Univariate pruned dynamic programming Rigaill et al (2010) or bivariate dynamic programming
"PSCBS": Parent-specific copy number in paired tumor-normal studies using circular binary segmentation by Olshen A. et al (2011)
"other": The segmentation method is passed as a function using argument segFUN (see examples in directory otherMethods).

stat

A vector containing the names or indices of the columns of Y to be segmented

dropOutliers

If TRUE, outliers are droped by using DNAcopy package

rankTransform

If TRUE, data are replaced by their ranks before segmentation

...

Further arguments to be passed to jointSeg

profile

Trace time and memory usage ?

verbose

A logical value: should extra information be output ? Defaults to FALSE.

Author

Morgane Pierre-Jean and Pierre Neuvial

Details

Before segmentation, the decrease in heterozygosity d=2|b-1/2| defined in Bengtsson et al, 2010 is calculated from the input data. d is only defined for heterozygous SNPs, that is, SNPs for which data$genotype==1/2. d may be seen as a "mirrored" version of allelic ratios (b): it converts them to a piecewise-constant signals by taking advantage of the bimodality of b for heterozygous SNPs. The rationale for this transformation is that allelic ratios (b) are only informative for heterozygous SNPs (see e.g. Staaf et al, 2008).

Before segmentation, the outliers in the copy number signal are droped according the method explained by Venkatraman, E. S. and Olshen, A. B., 2007.

The resulting data are then segmented using the jointSeg function, which combines an initial segmentation according to argument method and pruning of candidate change points by dynamic programming (skipped when the initial segmentation *is* dynamic programming).

If argument stat is not provided, then dynamic programming is run on the two dimensional statistic "(c,d)".

If argument stat is provided, then dynamic programming is run on stat; in this case we implicitly assume that stat is a piecewise-constant signal.

References

Bengtsson, H., Neuvial, P., & Speed, T. P. (2010). TumorBoost: Normalization of allele-specific tumor copy numbers from a single pair of tumor-normal genotyping microarrays. BMC bioinformatics, 11(1), 245.

Staaf, J., Lindgren, D., Vallon-Christersson, et al. (2008). Segmentation-based detection of allelic imbalance and loss-of-heterozygosity in cancer cells using whole genome SNP arrays. Genome Biol, 9(9), R136.

Pierre-Jean, M, Rigaill, G. J. and Neuvial, P. (2015). "Performance Evaluation of DNA Copy Number Segmentation Methods." *Briefings in Bioinformatics*, no. 4: 600-615.

Examples

Run this code


## load known real copy number regions
affyDat <- acnr::loadCnRegionData(dataSet="GSE29172", tumorFraction=0.5)

## generate a synthetic CN profile
K <- 10
len <- 1e4
sim <- getCopyNumberDataByResampling(len, K, regData=affyDat)
datS <- sim$profile

## run binary segmentation (+ dynamic programming)
resRBS <- PSSeg(data=datS, method="RBS", stat=c("c", "d"), K=2*K, profile=TRUE)
resRBS$prof

getTpFp(resRBS$bestBkp, sim$bkp, tol=5)
plotSeg(datS, breakpoints=list(sim$bkp, resRBS$bestBkp))

Run the code above in your browser using DataLab