anomSegmentBAF
for each sample and chromosome, breaks the chromosome up into
segments marked by change points of a metric based on B Allele Frequency (BAF) values.
anomFilterBAF
selects segments which are likely to be anomalous.
anomDetectBAF
is a wrapper to run anomSegmentBAF
and
anomFilterBAF
in one step.
anomSegmentBAF(intenData, genoData, scan.ids, chrom.ids, snp.ids, smooth = 50, min.width = 5, nperm = 10000, alpha = 0.001, verbose = TRUE)
anomFilterBAF(intenData, genoData, segments, snp.ids, centromere, low.qual.ids = NULL, num.mark.thresh = 15, long.num.mark.thresh = 200, sd.reg = 2, sd.long = 1, low.frac.used = 0.1, run.size = 10, inter.size = 2, low.frac.used.num.mark = 30, very.low.frac.used = 0.01, low.qual.frac.num.mark = 150, lrr.cut = -2, ct.thresh = 10, frac.thresh = 0.1, verbose=TRUE, small.thresh=2.5, dev.sim.thresh=0.1, centSpan.fac=1.25, centSpan.nmark=50)
anomDetectBAF(intenData, genoData, scan.ids, chrom.ids, snp.ids, centromere, low.qual.ids = NULL, ...)
IntensityData
object containing the B Allele
Frequency. The order of the rows of intenData and the snp annotation
are expected to be by chromosome and then by position within chromosome.
The scan annotation should contain sex, coded as "M" for male and
"F" for female.
GenotypeData
object. The order of the rows of genoData
and the snp annotation are expected to be by chromosome and then
by position within chromosome.
intenData
. Recommended to include
all autosomes, and optionally X (males will be ignored) and the
pseudoautosomal (XY) region.
HLA
and pseudoautosomal
.
If there are SNPs annotated in the centromere gap, exclude these as
well (see centromeres
).
smooth.CNA
in the DNAcopy package.
anomSegmentBAF
. Names must
include "scanID", "chromosome", "num.mark", "left.index", "right.index", "seg.mean".
Here "left.index" and "right.index" are row indices of intenData. Left and right
refer to start and end of anomaly,respectively, in position order.
centromeres
.
sdByScanChromWindow
and medianSdOverAutosomes
.
sd.reg
but applied to "long" segments
low.frac.used
segments (which are not
declared homozygous deletions
low.qual.ids
)
for segments that are also below low.frac.used threshold
lrr.cut
to adjust homozygous deletion endpoints
lrr.cut
needed in order to adjust
lrr.cut
and ct.thresh
thresholds met and (# LRR values below lrr.cut
)/(# eligible SNPs in segment) > frac.thresh
anomFilterBAF
anomSegmentBAF
returns a data.frame with the following elements: Left and right
refer to start and end of anomaly, respectively, in position order.anomFilterBAF
and anomDetectBAF
return a list with the
following elements:
anomSegmentBAF
as well as:
left.base
: base position of left endpoint of segment
right.base
: base position of right endpoint of segment
sex
: sex of scan.id coded as "M" or "F"
sd.fac
: measure of deviation from baseline equal to
abs(mean of segment - baseline mean)/(baseline standard deviation);
used in determining anomalous segments
raw
as well as:
merge
: TRUE if segment was a result of merging. Consecutive segments
from output of anomSegmentBAF
that meet certain criteria are merged.
homodel.adjust
: TRUE if original segment was adjusted to
narrow in on a homozygous deletion
frac.used
: fraction of (eligible) heterozygous or missing SNP markers compared with total number of
eligible SNP markers in segment
scanID
: integer id of scan
base.mean
: mean of non-anomalous baseline. This is the mean of the
BAF metric for heterozygous and missing SNPs over all unsegmented autosomes
that were considered.
base.sd
: standard deviation of non-anomalous baseline
chr.ct
: number of unsegmented chromosomes used in determining
the non-anomalous baseline
scanID
: integer id of scan
chromosome
: chromosome as integer
num.segs
: number of segments produced by anomSegmentBAF
anomSegmentBAF
uses the function segment
from
the DNAcopy
package to perform circular binary segmentation
on a metric based on BAF values. The metric for a given sample/chromosome
is sqrt(min(BAF,1-BAF,abs(BAF-median(BAF))) where the median is
across BAF values on the chromosome. Only BAF values for heterozygous or
missing SNPs are used.anomFilterBAF
determines anomalous segments based on a combination
of thresholds for number of SNP markers in the segment and on deviation from
a "normal" baseline. (See num.mark.thresh
,long.num.mark.thresh
,
sd.reg
, and sd.long
.) The "normal" baseline metric mean and standard deviation
are found across all autosomes not segmented by anomSegmentBAF
. This is why
it is recommended to include all autosomes for the argument chrom.ids
to
ensure a more accurate baseline.
Some initial filtering is done,
including possible merging of consecutive segments meeting sd.reg
threshold along with other criteria (such as not spanning the centromere)
and adjustment for accurate
break points for possible homozygous deletions (see lrr.cut
,
ct.thresh
, frac.thresh
, run.size
, and inter.size
).
Male samples for X chromosome are not processed.
More stringent criteria are applied to some segments
(see low.frac.used
,low.frac.used.num.mark
,
very.low.frac.used
, low.qual.ids
, and
low.qual.frac.num.mark
).
anomDetectBAF
runs anomSegmentBAF
with default values and
then runs anomFilterBAF
. Additional parameters for
anomFilterBAF
may be passed as arguments.
See references in segment
in the package DNAcopy.
The BAF metric used is modified from Itsara,A., et.al (2009) Population
Analysis of Large Copy Number Variants and Hotspots of Human Genetic Disease.
American Journal of Human Genetics, 84, 148--161.
segment
and smooth.CNA
in the package DNAcopy,
also findBAFvariance
, anomDetectLOH
library(GWASdata)
data(illuminaScanADF, illuminaSnpADF)
blfile <- system.file("extdata", "illumina_bl.gds", package="GWASdata")
bl <- GdsIntensityReader(blfile)
blData <- IntensityData(bl, scanAnnot=illuminaScanADF, snpAnnot=illuminaSnpADF)
genofile <- system.file("extdata", "illumina_geno.gds", package="GWASdata")
geno <- GdsGenotypeReader(genofile)
genoData <- GenotypeData(geno, scanAnnot=illuminaScanADF, snpAnnot=illuminaSnpADF)
# segment BAF
scan.ids <- illuminaScanADF$scanID[1:2]
chrom.ids <- unique(illuminaSnpADF$chromosome)
snp.ids <- illuminaSnpADF$snpID[illuminaSnpADF$missing.n1 < 1]
seg <- anomSegmentBAF(blData, genoData, scan.ids=scan.ids,
chrom.ids=chrom.ids, snp.ids=snp.ids)
# filter segments to detect anomalies
data(centromeres.hg18)
filt <- anomFilterBAF(blData, genoData, segments=seg, snp.ids=snp.ids,
centromere=centromeres.hg18)
# alternatively, run both steps at once
anom <- anomDetectBAF(blData, genoData, scan.ids=scan.ids, chrom.ids=chrom.ids,
snp.ids=snp.ids, centromere=centromeres.hg18)
close(blData)
close(genoData)
Run the code above in your browser using DataLab