Learn R Programming

jointseg (version 1.0.2)

getCopyNumberDataByResampling: Generate a copy number profile by resampling

Description

Generate a copy number profile by resampling input data

Usage

getCopyNumberDataByResampling(length, nBkp = NA, bkp = NULL,
  regData = NULL, regions = NULL, regAnnot = NULL, minLength = 0,
  regionSize = 0, connex = TRUE)

Arguments

length

length of the profile

nBkp

number of breakpoints. If NULL, then argument bkp is expected to be provided.

bkp

a numeric vector of breakpoint positions that may be used to bypass the breakpoint generation step. Defaults to NULL.

regData

a data.frame containing copy number data for different types of copy number regions. Columns:

c

Total copy number

b

Allele B fraction (a.k.a. BAF)

region

a character value, annotation label for the region. See Details.

genotype

the (germline) genotype of SNPs. By definition, rows with missing genotypes are interpreted as non-polymorphic loci (a.k.a. copy number probes).

regions

a character vector of region labels that may be used to bypass the region label generation step. Defaults to NULL.

regAnnot

a data.frame containing annotation data for each copy number region. Columns:

region

label of the form (must match regData[["region"]]).

freq

frequency (in [0,1]) of this type of region in the genome.

If NULL (the default), frequencies of regions (0,1), (0,2), (1,1) and (1,2) (the most common alterations) are set to represent 90% of the regions. sum(regAnnot[["freq"]]) should be 1.

minLength

minimum length of region between breakpoints. Defaults to 0.

regionSize

If regionSize>0, breakpoints are included by pairs, where the distance within pair is set to regionSize. nBkp is then required to be an even number.

connex

If TRUE, any two successive regions are constrained to be connex in the (minor CN, major CN) space. See 'Details'.

Value

A list with elements

bkp

a vector of bkp positions (the last row index before a breakpoint)

regions

a character vector of region labels

Details

This function generates a random copy number profile of length 'length', with 'nBkp' breakpoints randomly chosen. Between two breakpoints, the profile is constant and taken among the different types of regions in regData.

Elements of regData[["region"]] must be of the form "(C1,C2)", where C1 denotes the minor copy number and C2 denotes the major copy number. For example,

(1,1)

Normal

(0,1)

Hemizygous deletion

(0,0)

Homozygous deletion

(1,2)

Single copy gain

(0,2)

Copy-neutral LOH

(2,2)

Balanced two-copy gain

(1,3)

Unbalanced two-copy gain

(0,3)

Single-copy gain with LOH

If 'connex' is set to TRUE (the default), transitions between copy number regions are constrained in such a way that for any breakpoint, one of the minor and the major copy number does not change. Equivalently, this means that all breakpoints can be seen in both total copy numbers and allelic ratios.

References

Pierre-Jean, M, Rigaill, G. J. and Neuvial, P. (2015). "Performance Evaluation of DNA Copy Number Segmentation Methods." *Briefings in Bioinformatics*, no. 4: 600-615.

Examples

Run this code
# NOT RUN {
affyDat <- acnr::loadCnRegionData(dataSet="GSE29172", tumorFraction=1)
sim <- getCopyNumberDataByResampling(len=1e4, nBkp=5, minLength=100, regData=affyDat)
plotSeg(sim$profile, sim$bkp)

## another run with identical parameters
bkp <- sim$bkp
regions <- sim$regions
sim2 <- getCopyNumberDataByResampling(len=1e4, bkp=bkp, regData=affyDat, regions=regions)
plotSeg(sim2$profile, bkp)

## change tumor fraction but keep same "truth"
affyDatC <- acnr::loadCnRegionData(dataSet="GSE29172", tumorFraction=0.5)
simC <- getCopyNumberDataByResampling(len=1e4, bkp=bkp, regData=affyDatC, regions=regions)
plotSeg(simC$profile, bkp)

## restrict to only normal, single copy gain, and copy-neutral LOH
## with the same bkp
affyDatR <- subset(affyDat, region %in% c("(1,1)", "(0,2)", "(1,2)"))
simR <- getCopyNumberDataByResampling(len=1e4, bkp=bkp, regData=affyDatR)
plotSeg(simR$profile, bkp)

## Same 'truth', on another dataSet
regions <- simR$regions
illuDat <- acnr::loadCnRegionData(dataSet="GSE11976", tumorFraction=1)
sim <- getCopyNumberDataByResampling(len=1e4, bkp=bkp, regData=illuDat, regions=regions)
plotSeg(sim$profile, sim$bkp)

# }

Run the code above in your browser using DataLab