anomDetectLOH: LOH Method for Chromosome Anomaly Detection

Description

anomDetectLOH breaks a chromosome up into segments of homozygous runs of SNP markers determined by change points in Log R Ratio and selects segments which are likely to be anomalous.

Usage

anomDetectLOH(intenData, genoData, scan.ids, chrom.ids, snp.ids, known.anoms, smooth = 50, min.width = 5, nperm = 10000, alpha = 0.001, run.size = 50, inter.size = 4, homodel.min.num = 10, homodel.thresh = 10, small.num = 20, small.thresh = 2.25, medium.num = 50, medium.thresh = 2, long.num = 100, long.thresh = 1.5, small.na.thresh = 2.5, length.factor = 5, merge.fac = 0.85, min.lrr.num = 20, verbose = TRUE)

Arguments

intenData

An IntensityData object containing the Log R Ratio. The order of the rows of intenData and the snp annotation are expected to be by chromosome and then by position within chromosome. The scan annotation should contain sex, coded as "M" for male and "F" for female.

genoData

A GenotypeData object. The order of the rows of genoData and the snp annotation are expected to be by chromosome and then by position within chromosome.

scan.ids

vector of scan ids (sample numbers) to process

chrom.ids

vector of (unique) chromosomes to process. Should correspond to integer chromosome codes in intenData. Recommended for use with autosomes, X (males will be ignored), and the pseudoautosomal (XY) region.

snp.ids

vector of eligible snp ids. Usually exclude failed and intensity-only snps. Also recommended to exclude an HLA region on chromosome 6 and XTR region on X chromosome. See HLA and pseudoautosomal. If there are SNPs annotated in the centromere gap, exclude these as well (see centromeres).

known.anoms

data.frame of known anomalies (usually from anomDetectBAF); must have "scanID","chromosome","left.index","right.index". Here "left.index" and "right.index" are row indices of intenData. Left and right refer to start and end of anomaly, respectively, in position order.

smooth

number of markers for smoothing region. See smooth.CNA in the DNAcopy package.

min.width

minimum number of markers for segmenting. See segment in the DNAcopy package.

nperm

number of permutations. See segment in the DNAcopy package.

alpha

significance level. See segment in the DNAcopy package.

run.size

number of markers to declare a 'homozygous' run (here 'homozygous' includes homozygous and missing)

inter.size

number of consecutive heterozygous markers allowed to interrupt a 'homozygous' run

homodel.min.num

minimum number of markers to detect extreme difference in lrr (for homozygous deletion)

homodel.thresh

threshold for measure of deviation from non-anomalous needed to declare segment a homozygous deletion.

small.num

minimum number of SNP markers to declare segment as an anomaly (other than homozygous deletion)

small.thresh

threshold for measure of deviation from non-anomalous to declare segment anomalous if number of SNP markers is between small.num and medium.num.

medium.num

threshold for number of SNP markers to identify 'medium' size segment

medium.thresh

threshold for measure of deviation from non-anomalous needed to declare segment anomalous if number of SNP markers is between medium.num and long.num.

long.num

threshold for number of SNP markers to identify 'long' size segment

long.thresh

threshold for measure of deviation from non-anomalous when number of markers is bigger than long.num

small.na.thresh

threshold measure of deviation from non-anomalous when number of markers is between small.num and medium.num and 'local mad.fac' is NA. See Details section for definition of 'local mad.fac'.

length.factor

window around anomaly defined as length.factor*(no. of markers in segment) on either side of the given segment. Used in determining 'local mad.fac'. See Details section.

merge.fac

threshold for 'sd.fac'= number of baseline standard deviations of segment mean from baseline mean; consecutive segments with 'sd.fac' above threshold are merged

min.lrr.num

if any 'non-anomalous' interval has fewer markers than min.lrr.num, interval is ignored in finding non-anomalous baseline unless it's the only piece left

verbose

logical indicator whether to print the scan id currently being processed

Value

raw

raw homozygous run data, not including any regions present in known.anoms. A data.frame with the following columns: Left and right refer to start and end of anomaly, respectively, in position order.

left.index: row index of intenData indicating left endpoint of segment
right.index: row index of intenData indicating right endpoint of segment
left.base: base position of left endpoint of segment
right.base: base position of right endpoint of segment
scanID: integer id of scan
chromosome: chromosome as integer code

raw.adjusted

data.frame of runs after merging and intersecting with CBS segments, with the following columns: Left and right refer to start and end of anomaly, respectively, in position order.

scanID: integer id of scan
chromosome: chromosome as integer code
left.index: row index of intenData indicating left endpoint of segment
right.index: row index of intenData indicating right endpoint of segment
left.base: base position of left endpoint of segment
right.base: base position of right endpoint of segment
num.mark: number of eligible SNP markers in segment
seg.median: median of eligible LRR values in segment
seg.mean: mean of eligible LRR values in segment
mad.fac: measure of deviation from non-anomalous baseline, equal to abs(median of segment - baseline median)/(baseline MAD); used in determining anomalous segments
sd.fac: measure of deviation from non-anomalous baseline, equal to abs(mean of segment - baseline mean)/(baseline standard deviation); used in determining whether to merge
local: measure of deviation from non-anomalous baseline used equal to abs(median of segment - local baseline median)/(local baseline MAD); local baseline consists of eligible LRR values in a window around segment; used in determining anomalous segments
num.segs: number of segments found by CBS for the given chromosome
chrom.nonanom.mad: MAD of eligible LRR values in non-anomalous regions across the chromosome
chrom.nonanom.median: median of eligible LRR values in non-anomalous regions across the chromosome
chrom.nonanom.mean: mean of eligible LRR values in non-anomalous regions across the chromosome
chrom.nonanom.sd: standard deviation of eligible LRR values in non-anomalous regions across the chromosome
sex: sex of the scan id coded as "M" or "F"

filtered

data.frame of the segments identified as anomalies. Columns are the same as in raw.adjusted.

base.info

data.frame with columns:

chrom.nonanom.mad: MAD of eligible LRR values in non-anomalous regions across the chromosome
chrom.nonanom.median: median of eligible LRR values in non-anomalous regions across the chromosome
chrom.nonanom.mean: mean of eligible LRR values in non-anomalous regions across the chromosome
chrom.nonanom.sd: standard deviation of eligible LRR values in non-anomalous regions across the chromosome
sex: sex of the scan id coded as "M" or "F"
num.runs: number of original homozygous runs found for given scan/chromosome
num.segs: number of segments for given scan/chromosome produced by CBS
scanID: integer id of scan
chromosome: chromosome as integer code
sex: sex of the scan id coded as "M" or "F"

segments

data.frame of the segmentation found by CBS with columns:

scanID: integer id of scan
chromosome: chromosome as integer code
left.index: row index of intenData indicating left endpoint of segment
right.index: row index of intenData indicating right endpoint of segment
left.base: base position of left endpoint of segment
right.base: base position of right endpoint of segment
num.mark: number of eligible SNP markers in the segment
seg.mean: mean of eligible LRR values in the segment
sd.fac: measure of deviation from baseline equal to abs(mean of segment - baseline mean)/(baseline standard deviation) where the baseline is over non-anomalous regions

merge

data.frame of scan id/chromosome pairs for which merging occurred.

scanID: integer id of scan
chromosome: chromosome as integer code

Details

Detection of anomalies with loss of heterozygosity accompanied by change in Log R Ratio. Male samples for X chromosome are not processed.

Circular binary segmentation (CBS) (using the R-package DNAcopy) is applied to LRR values and, in parallel, runs of homozygous or missing genotypes of a certain minimal size (run.size) (and allowing for some interruptions by no more than inter.size heterozygous SNPs ) are identified. Intervals from known.anoms are excluded from the identification of runs. After some possible merging of consecutive CBS segments (based on satisfying a threshold merge.fac for deviation from non-anomalous baseline), the homozygous runs are intersected with the segments from CBS.

Determination of anomalous segments is based on a combination of number-of-marker thresholds and deviation from a non-anomalous baseline. Segments are declared anomalous if deviation from non-anomalous is above corresponding thresholds. (See small.num, small.thresh, medium.num,medium.thresh, long.num,long.thresh,and small.na.thresh.) Non-anomalous median and MAD are defined for each sample-chromosome combination. Intervals from known.anoms and the homozygous runs identified are excluded; remaining regions are the non-anomalous baseline.

Deviation from non-anomalous is measured by a combination of a chromosome-wide 'mad.fac' and a 'local mad.fac' (both the average and the minimum of these two measures are used). Here 'mad.fac' is (segment median-non-anomalous median)/(non-anomalous MAD) and 'local mad.fac' is the same definition except the non-anomalous median and MAD are computed over a window including the segment (see length.factor). Median and MADare found for eligible LRR values.

References

See references in segment in the package DNAcopy.

Examples

Run this code

library(GWASdata)
data(illuminaScanADF, illuminaSnpADF)

blfile <- system.file("extdata", "illumina_bl.gds", package="GWASdata")
bl <- GdsIntensityReader(blfile)
blData <-  IntensityData(bl, scanAnnot=illuminaScanADF, snpAnnot=illuminaSnpADF)

genofile <- system.file("extdata", "illumina_geno.gds", package="GWASdata")
geno <- GdsGenotypeReader(genofile)
genoData <-  GenotypeData(geno, scanAnnot=illuminaScanADF, snpAnnot=illuminaSnpADF)

scan.ids <- illuminaScanADF$scanID[1:2]
chrom.ids <- unique(illuminaSnpADF$chromosome)
snp.ids <- illuminaSnpADF$snpID[illuminaSnpADF$missing.n1 < 1]

# example for known.anoms, would get this from anomDetectBAF
known.anoms <- data.frame("scanID"=scan.ids[1],"chromosome"=21,
  "left.index"=100,"right.index"=200)

LOH.anom <- anomDetectLOH(blData, genoData, scan.ids=scan.ids,
  chrom.ids=chrom.ids, snp.ids=snp.ids, known.anoms=known.anoms)

close(blData)
close(genoData)

Run the code above in your browser using DataLab