Learn R Programming

GWASTools (version 1.18.0)

anomDetectBAF: BAF Method for Chromosome Anomaly Detection

Description

anomSegmentBAF for each sample and chromosome, breaks the chromosome up into segments marked by change points of a metric based on B Allele Frequency (BAF) values.

anomFilterBAF selects segments which are likely to be anomalous.

anomDetectBAF is a wrapper to run anomSegmentBAF and anomFilterBAF in one step.

Usage

anomSegmentBAF(intenData, genoData, scan.ids, chrom.ids, snp.ids, smooth = 50, min.width = 5, nperm = 10000, alpha = 0.001, verbose = TRUE)
anomFilterBAF(intenData, genoData, segments, snp.ids, centromere, low.qual.ids = NULL, num.mark.thresh = 15, long.num.mark.thresh = 200, sd.reg = 2, sd.long = 1, low.frac.used = 0.1, run.size = 10, inter.size = 2, low.frac.used.num.mark = 30, very.low.frac.used = 0.01, low.qual.frac.num.mark = 150, lrr.cut = -2, ct.thresh = 10, frac.thresh = 0.1, verbose=TRUE, small.thresh=2.5, dev.sim.thresh=0.1, centSpan.fac=1.25, centSpan.nmark=50)
anomDetectBAF(intenData, genoData, scan.ids, chrom.ids, snp.ids, centromere, low.qual.ids = NULL, ...)

Arguments

intenData
An IntensityData object containing the B Allele Frequency. The order of the rows of intenData and the snp annotation are expected to be by chromosome and then by position within chromosome. The scan annotation should contain sex, coded as "M" for male and "F" for female.
genoData
A GenotypeData object. The order of the rows of genoData and the snp annotation are expected to be by chromosome and then by position within chromosome.
scan.ids
vector of scan ids (sample numbers) to process
chrom.ids
vector of (unique) chromosomes to process. Should correspond to integer chromosome codes in intenData. Recommended to include all autosomes, and optionally X (males will be ignored) and the pseudoautosomal (XY) region.
snp.ids
vector of eligible snp ids. Usually exclude failed and intensity-only SNPs. Also recommended to exclude an HLA region on chromosome 6 and XTR region on X chromosome. See HLA and pseudoautosomal. If there are SNPs annotated in the centromere gap, exclude these as well (see centromeres).
smooth
number of markers for smoothing region. See smooth.CNA in the DNAcopy package.
min.width
minimum number of markers for a segment. See segment in the DNAcopy package.
nperm
number of permutations for deciding significance in segmentation. See segment in the DNAcopy package.
alpha
significance level. See segment in the DNAcopy package.
verbose
logical indicator whether to print information about the scan id currently being processed. anomSegmentBAF prints each scan id; anomFilterBAF prints a message after every 10 samples: "processing ith scan id out of n" where "ith" with be 10, 10, etc. and "n" is the total number of samples
segments
data.frame of segments from anomSegmentBAF. Names must include "scanID", "chromosome", "num.mark", "left.index", "right.index", "seg.mean". Here "left.index" and "right.index" are row indices of intenData. Left and right refer to start and end of anomaly,respectively, in position order.
centromere
data.frame with centromere position information. Names must include "chrom", "left.base", "right.base". Valid values for "chrom" are 1:22, "X", "Y", "XY". Here "left.base" and "right.base" are base positions of start and end of centromere location in position order. Centromere data tables are provided in centromeres.
low.qual.ids
scan ids determined to be low quality for which some segments are filtered based on more stringent criteria. Default is NULL. Usual choice are scan ids for which median BAF across autosomes > 0.05. See sdByScanChromWindow and medianSdOverAutosomes.
num.mark.thresh
minimum number of SNP markers in a segment to be considered for anomaly
long.num.mark.thresh
min number of markers for "long" segment to be considered for anomaly for which significance threshold criterion is allowed to be less stringent
sd.reg
number of baseline standard deviations of segment mean from a baseline mean for "normal" needed to declare segment anomalous. This number is given by abs(mean of segment - baseline mean)/(baseline standard deviation)
sd.long
same meaning as sd.reg but applied to "long" segments
low.frac.used
if fraction of heterozygous or missing SNP markers compared with number of eligible SNP markers in segment is below this, more stringent criteria are applied to declare them anomalous.
run.size
min length of run of missing or heterozygous SNP markers for possible determination of homozygous deletions
inter.size
number of homozygotes allowed to "interrupt" run for possible determination of homozygous deletions
low.frac.used.num.mark
number of markers threshold for low.frac.used segments (which are not declared homozygous deletions
very.low.frac.used
any segments with (num.mark)/(number of markers in interval) less than this are filtered out since they tend to be false positives
low.qual.frac.num.mark
minimum num.mark threshold for low quality scans (low.qual.ids) for segments that are also below low.frac.used threshold
lrr.cut
look for runs of LRR values below lrr.cut to adjust homozygous deletion endpoints
ct.thresh
minimum number of LRR values below lrr.cut needed in order to adjust
frac.thresh
investigate interval for homozygous deletion only if lrr.cut and ct.thresh thresholds met and (# LRR values below lrr.cut)/(# eligible SNPs in segment) > frac.thresh
small.thresh
sd.fac threshold use in making merge decisions involving small num.mark segments
dev.sim.thresh
relative error threshold for determining similarity in BAF deviations; used in merge decisions
centSpan.fac
thresholds increased by this factor when considering filtering/keeping together left and right halves of centromere spanning segments
centSpan.nmark
minimum number of markers under which centromere spanning segments are automatically filtered out
...
arguments to pass to anomFilterBAF

Value

anomSegmentBAF returns a data.frame with the following elements: Left and right refer to start and end of anomaly, respectively, in position order.
scanID
integer id of scan
chromosome
chromosome as integer code
left.index
row index of intenData indicating left endpoint of segment
right.index
row index of intenData indicating right endpoint of segment
num.mark
number of heterozygous or missing SNPs in the segment
seg.mean
mean of the BAF metric over the segment
anomFilterBAF and anomDetectBAF return a list with the following elements:
raw
data.frame of raw segmentation data, with same output as anomSegmentBAF as well as:
  • left.base: base position of left endpoint of segment
  • right.base: base position of right endpoint of segment
  • sex: sex of scan.id coded as "M" or "F"
  • sd.fac: measure of deviation from baseline equal to abs(mean of segment - baseline mean)/(baseline standard deviation); used in determining anomalous segments
filtered
data.frame of the segments identified as anomalies, with the same columns as raw as well as:
  • merge: TRUE if segment was a result of merging. Consecutive segments from output of anomSegmentBAF that meet certain criteria are merged.
  • homodel.adjust: TRUE if original segment was adjusted to narrow in on a homozygous deletion
  • frac.used: fraction of (eligible) heterozygous or missing SNP markers compared with total number of eligible SNP markers in segment
base.info
data frame with columns:
  • scanID: integer id of scan
  • base.mean: mean of non-anomalous baseline. This is the mean of the BAF metric for heterozygous and missing SNPs over all unsegmented autosomes that were considered.
  • base.sd: standard deviation of non-anomalous baseline
  • chr.ct: number of unsegmented chromosomes used in determining the non-anomalous baseline
seg.info
data frame with columns:
  • scanID: integer id of scan
  • chromosome: chromosome as integer
  • num.segs: number of segments produced by anomSegmentBAF

Details

anomSegmentBAF uses the function segment from the DNAcopy package to perform circular binary segmentation on a metric based on BAF values. The metric for a given sample/chromosome is sqrt(min(BAF,1-BAF,abs(BAF-median(BAF))) where the median is across BAF values on the chromosome. Only BAF values for heterozygous or missing SNPs are used.

anomFilterBAF determines anomalous segments based on a combination of thresholds for number of SNP markers in the segment and on deviation from a "normal" baseline. (See num.mark.thresh,long.num.mark.thresh, sd.reg, and sd.long.) The "normal" baseline metric mean and standard deviation are found across all autosomes not segmented by anomSegmentBAF. This is why it is recommended to include all autosomes for the argument chrom.ids to ensure a more accurate baseline.

Some initial filtering is done, including possible merging of consecutive segments meeting sd.reg threshold along with other criteria (such as not spanning the centromere) and adjustment for accurate break points for possible homozygous deletions (see lrr.cut, ct.thresh, frac.thresh, run.size, and inter.size). Male samples for X chromosome are not processed.

More stringent criteria are applied to some segments (see low.frac.used,low.frac.used.num.mark, very.low.frac.used, low.qual.ids, and low.qual.frac.num.mark).

anomDetectBAF runs anomSegmentBAF with default values and then runs anomFilterBAF. Additional parameters for anomFilterBAF may be passed as arguments.

References

See references in segment in the package DNAcopy. The BAF metric used is modified from Itsara,A., et.al (2009) Population Analysis of Large Copy Number Variants and Hotspots of Human Genetic Disease. American Journal of Human Genetics, 84, 148--161.

See Also

segment and smooth.CNA in the package DNAcopy, also findBAFvariance, anomDetectLOH

Examples

Run this code
library(GWASdata)
data(illuminaScanADF, illuminaSnpADF)

blfile <- system.file("extdata", "illumina_bl.gds", package="GWASdata")
bl <- GdsIntensityReader(blfile)
blData <-  IntensityData(bl, scanAnnot=illuminaScanADF, snpAnnot=illuminaSnpADF)

genofile <- system.file("extdata", "illumina_geno.gds", package="GWASdata")
geno <- GdsGenotypeReader(genofile)
genoData <-  GenotypeData(geno, scanAnnot=illuminaScanADF, snpAnnot=illuminaSnpADF)

# segment BAF
scan.ids <- illuminaScanADF$scanID[1:2]
chrom.ids <- unique(illuminaSnpADF$chromosome)
snp.ids <- illuminaSnpADF$snpID[illuminaSnpADF$missing.n1 < 1]
seg <- anomSegmentBAF(blData, genoData, scan.ids=scan.ids,
                      chrom.ids=chrom.ids, snp.ids=snp.ids)

# filter segments to detect anomalies
data(centromeres.hg18)
filt <- anomFilterBAF(blData, genoData, segments=seg, snp.ids=snp.ids,
                      centromere=centromeres.hg18)

# alternatively, run both steps at once
anom <- anomDetectBAF(blData, genoData, scan.ids=scan.ids, chrom.ids=chrom.ids,
                      snp.ids=snp.ids, centromere=centromeres.hg18)

close(blData)
close(genoData)

Run the code above in your browser using DataLab