Learn R Programming

GWASTools (version 1.18.0)

anomDetectLOH: LOH Method for Chromosome Anomaly Detection

Description

anomDetectLOH breaks a chromosome up into segments of homozygous runs of SNP markers determined by change points in Log R Ratio and selects segments which are likely to be anomalous.

Usage

anomDetectLOH(intenData, genoData, scan.ids, chrom.ids, snp.ids, known.anoms, smooth = 50, min.width = 5, nperm = 10000, alpha = 0.001, run.size = 50, inter.size = 4, homodel.min.num = 10, homodel.thresh = 10, small.num = 20, small.thresh = 2.25, medium.num = 50, medium.thresh = 2, long.num = 100, long.thresh = 1.5, small.na.thresh = 2.5, length.factor = 5, merge.fac = 0.85, min.lrr.num = 20, verbose = TRUE)

Arguments

intenData
An IntensityData object containing the Log R Ratio. The order of the rows of intenData and the snp annotation are expected to be by chromosome and then by position within chromosome. The scan annotation should contain sex, coded as "M" for male and "F" for female.

genoData
A GenotypeData object. The order of the rows of genoData and the snp annotation are expected to be by chromosome and then by position within chromosome.
scan.ids
vector of scan ids (sample numbers) to process
chrom.ids
vector of (unique) chromosomes to process. Should correspond to integer chromosome codes in intenData. Recommended for use with autosomes, X (males will be ignored), and the pseudoautosomal (XY) region.
snp.ids
vector of eligible snp ids. Usually exclude failed and intensity-only snps. Also recommended to exclude an HLA region on chromosome 6 and XTR region on X chromosome. See HLA and pseudoautosomal. If there are SNPs annotated in the centromere gap, exclude these as well (see centromeres).
known.anoms
data.frame of known anomalies (usually from anomDetectBAF); must have "scanID","chromosome","left.index","right.index". Here "left.index" and "right.index" are row indices of intenData. Left and right refer to start and end of anomaly, respectively, in position order.
smooth
number of markers for smoothing region. See smooth.CNA in the DNAcopy package.
min.width
minimum number of markers for segmenting. See segment in the DNAcopy package.
nperm
number of permutations. See segment in the DNAcopy package.
alpha
significance level. See segment in the DNAcopy package.
run.size
number of markers to declare a 'homozygous' run (here 'homozygous' includes homozygous and missing)
inter.size
number of consecutive heterozygous markers allowed to interrupt a 'homozygous' run
homodel.min.num
minimum number of markers to detect extreme difference in lrr (for homozygous deletion)
homodel.thresh
threshold for measure of deviation from non-anomalous needed to declare segment a homozygous deletion.
small.num
minimum number of SNP markers to declare segment as an anomaly (other than homozygous deletion)
small.thresh
threshold for measure of deviation from non-anomalous to declare segment anomalous if number of SNP markers is between small.num and medium.num.
medium.num
threshold for number of SNP markers to identify 'medium' size segment
medium.thresh
threshold for measure of deviation from non-anomalous needed to declare segment anomalous if number of SNP markers is between medium.num and long.num.
long.num
threshold for number of SNP markers to identify 'long' size segment
long.thresh
threshold for measure of deviation from non-anomalous when number of markers is bigger than long.num
small.na.thresh
threshold measure of deviation from non-anomalous when number of markers is between small.num and medium.num and 'local mad.fac' is NA. See Details section for definition of 'local mad.fac'.
length.factor
window around anomaly defined as length.factor*(no. of markers in segment) on either side of the given segment. Used in determining 'local mad.fac'. See Details section.
merge.fac
threshold for 'sd.fac'= number of baseline standard deviations of segment mean from baseline mean; consecutive segments with 'sd.fac' above threshold are merged
min.lrr.num
if any 'non-anomalous' interval has fewer markers than min.lrr.num, interval is ignored in finding non-anomalous baseline unless it's the only piece left
verbose
logical indicator whether to print the scan id currently being processed

Value

A list with the following elements:
raw
raw homozygous run data, not including any regions present in known.anoms. A data.frame with the following columns: Left and right refer to start and end of anomaly, respectively, in position order.
  • left.index: row index of intenData indicating left endpoint of segment
  • right.index: row index of intenData indicating right endpoint of segment
  • left.base: base position of left endpoint of segment
  • right.base: base position of right endpoint of segment
  • scanID: integer id of scan
  • chromosome: chromosome as integer code
raw.adjusted
data.frame of runs after merging and intersecting with CBS segments, with the following columns: Left and right refer to start and end of anomaly, respectively, in position order.
  • scanID: integer id of scan
  • chromosome: chromosome as integer code
  • left.index: row index of intenData indicating left endpoint of segment
  • right.index: row index of intenData indicating right endpoint of segment
  • left.base: base position of left endpoint of segment
  • right.base: base position of right endpoint of segment
  • num.mark: number of eligible SNP markers in segment
  • seg.median: median of eligible LRR values in segment
  • seg.mean: mean of eligible LRR values in segment
  • mad.fac: measure of deviation from non-anomalous baseline, equal to abs(median of segment - baseline median)/(baseline MAD); used in determining anomalous segments
  • sd.fac: measure of deviation from non-anomalous baseline, equal to abs(mean of segment - baseline mean)/(baseline standard deviation); used in determining whether to merge
  • local: measure of deviation from non-anomalous baseline used equal to abs(median of segment - local baseline median)/(local baseline MAD); local baseline consists of eligible LRR values in a window around segment; used in determining anomalous segments
  • num.segs: number of segments found by CBS for the given chromosome
  • chrom.nonanom.mad: MAD of eligible LRR values in non-anomalous regions across the chromosome
  • chrom.nonanom.median: median of eligible LRR values in non-anomalous regions across the chromosome
  • chrom.nonanom.mean: mean of eligible LRR values in non-anomalous regions across the chromosome
  • chrom.nonanom.sd: standard deviation of eligible LRR values in non-anomalous regions across the chromosome
  • sex: sex of the scan id coded as "M" or "F"
filtered
data.frame of the segments identified as anomalies. Columns are the same as in raw.adjusted.
base.info
data.frame with columns:
  • chrom.nonanom.mad: MAD of eligible LRR values in non-anomalous regions across the chromosome
  • chrom.nonanom.median: median of eligible LRR values in non-anomalous regions across the chromosome
  • chrom.nonanom.mean: mean of eligible LRR values in non-anomalous regions across the chromosome
  • chrom.nonanom.sd: standard deviation of eligible LRR values in non-anomalous regions across the chromosome
  • sex: sex of the scan id coded as "M" or "F"
  • num.runs: number of original homozygous runs found for given scan/chromosome
  • num.segs: number of segments for given scan/chromosome produced by CBS
  • scanID: integer id of scan
  • chromosome: chromosome as integer code
  • sex: sex of the scan id coded as "M" or "F"
segments
data.frame of the segmentation found by CBS with columns:
  • scanID: integer id of scan
  • chromosome: chromosome as integer code
  • left.index: row index of intenData indicating left endpoint of segment
  • right.index: row index of intenData indicating right endpoint of segment
  • left.base: base position of left endpoint of segment
  • right.base: base position of right endpoint of segment
  • num.mark: number of eligible SNP markers in the segment
  • seg.mean: mean of eligible LRR values in the segment
  • sd.fac: measure of deviation from baseline equal to abs(mean of segment - baseline mean)/(baseline standard deviation) where the baseline is over non-anomalous regions
merge
data.frame of scan id/chromosome pairs for which merging occurred.
  • scanID: integer id of scan
  • chromosome: chromosome as integer code

Details

Detection of anomalies with loss of heterozygosity accompanied by change in Log R Ratio. Male samples for X chromosome are not processed.

Circular binary segmentation (CBS) (using the R-package DNAcopy) is applied to LRR values and, in parallel, runs of homozygous or missing genotypes of a certain minimal size (run.size) (and allowing for some interruptions by no more than inter.size heterozygous SNPs ) are identified. Intervals from known.anoms are excluded from the identification of runs. After some possible merging of consecutive CBS segments (based on satisfying a threshold merge.fac for deviation from non-anomalous baseline), the homozygous runs are intersected with the segments from CBS.

Determination of anomalous segments is based on a combination of number-of-marker thresholds and deviation from a non-anomalous baseline. Segments are declared anomalous if deviation from non-anomalous is above corresponding thresholds. (See small.num, small.thresh, medium.num,medium.thresh, long.num,long.thresh,and small.na.thresh.) Non-anomalous median and MAD are defined for each sample-chromosome combination. Intervals from known.anoms and the homozygous runs identified are excluded; remaining regions are the non-anomalous baseline.

Deviation from non-anomalous is measured by a combination of a chromosome-wide 'mad.fac' and a 'local mad.fac' (both the average and the minimum of these two measures are used). Here 'mad.fac' is (segment median-non-anomalous median)/(non-anomalous MAD) and 'local mad.fac' is the same definition except the non-anomalous median and MAD are computed over a window including the segment (see length.factor). Median and MADare found for eligible LRR values.

References

See references in segment in the package DNAcopy.

See Also

segment and smooth.CNA in the package DNAcopy, also findBAFvariance, anomDetectLOH

Examples

Run this code
library(GWASdata)
data(illuminaScanADF, illuminaSnpADF)

blfile <- system.file("extdata", "illumina_bl.gds", package="GWASdata")
bl <- GdsIntensityReader(blfile)
blData <-  IntensityData(bl, scanAnnot=illuminaScanADF, snpAnnot=illuminaSnpADF)

genofile <- system.file("extdata", "illumina_geno.gds", package="GWASdata")
geno <- GdsGenotypeReader(genofile)
genoData <-  GenotypeData(geno, scanAnnot=illuminaScanADF, snpAnnot=illuminaSnpADF)

scan.ids <- illuminaScanADF$scanID[1:2]
chrom.ids <- unique(illuminaSnpADF$chromosome)
snp.ids <- illuminaSnpADF$snpID[illuminaSnpADF$missing.n1 < 1]

# example for known.anoms, would get this from anomDetectBAF
known.anoms <- data.frame("scanID"=scan.ids[1],"chromosome"=21,
  "left.index"=100,"right.index"=200)

LOH.anom <- anomDetectLOH(blData, genoData, scan.ids=scan.ids,
  chrom.ids=chrom.ids, snp.ids=snp.ids, known.anoms=known.anoms)

close(blData)
close(genoData)

Run the code above in your browser using DataLab