anomDetectLOH
breaks a chromosome up into segments of homozygous runs
of SNP markers determined by change points in Log R Ratio and
selects segments which are likely to be anomalous.
anomDetectLOH(intenData, genoData, scan.ids, chrom.ids, snp.ids, known.anoms, smooth = 50, min.width = 5, nperm = 10000, alpha = 0.001, run.size = 50, inter.size = 4, homodel.min.num = 10, homodel.thresh = 10, small.num = 20, small.thresh = 2.25, medium.num = 50, medium.thresh = 2, long.num = 100, long.thresh = 1.5, small.na.thresh = 2.5, length.factor = 5, merge.fac = 0.85, min.lrr.num = 20, verbose = TRUE)
IntensityData
object containing the Log R Ratio.
The order of the rows of intenData and the snp annotation
are expected to be by chromosome and then by position within chromosome.
The scan annotation should contain sex, coded as "M" for male and
"F" for female.GenotypeData
object. The order of the rows of genoData
and the snp annotation are expected to be by chromosome and then
by position within chromosome.
intenData
. Recommended for use with
autosomes, X (males will be ignored), and the
pseudoautosomal (XY) region.
HLA
and
pseudoautosomal
.
If there are SNPs annotated in the centromere gap, exclude these as
well (see centromeres
).
anomDetectBAF
);
must have "scanID","chromosome","left.index","right.index".
Here "left.index" and "right.index" are row indices of intenData. Left and right
refer to start and end of anomaly, respectively, in position order.
smooth.CNA
in the DNAcopy package.
small.num
and medium.num
.
medium.num
and long.num
.
long.num
small.num
and medium.num
and 'local mad.fac' is NA. See Details section for definition of
'local mad.fac'.
length.factor
*(no. of markers in segment)
on either side of the given segment. Used in determining 'local mad.fac'. See Details section.
min.lrr.num
,
interval is ignored in finding non-anomalous baseline unless it's the only piece left
known.anoms
.
A data.frame with the following
columns: Left and right
refer to start and end of anomaly, respectively, in position order.left.index
: row index of intenData indicating left endpoint of segment
right.index
: row index of intenData indicating right endpoint of segment
left.base
: base position of left endpoint of segment
right.base
: base position of right endpoint of segment
scanID
: integer id of scan
chromosome
: chromosome as integer code
scanID
: integer id of scan
chromosome
: chromosome as integer code
left.index
: row index of intenData indicating left endpoint of segment
right.index
: row index of intenData indicating right endpoint of segment
left.base
: base position of left endpoint of segment
right.base
: base position of right endpoint of segment
num.mark
: number of eligible SNP markers in segment
seg.median
: median of eligible LRR values in segment
seg.mean
: mean of eligible LRR values in segment
mad.fac
: measure of deviation from non-anomalous baseline, equal to
abs(median of segment - baseline median)/(baseline MAD);
used in determining anomalous segments
sd.fac
: measure of deviation from non-anomalous baseline, equal to
abs(mean of segment - baseline mean)/(baseline standard deviation);
used in determining whether to merge
local
: measure of deviation from non-anomalous baseline used equal to
abs(median of segment - local baseline median)/(local baseline MAD);
local baseline consists of eligible LRR values in a window around segment;
used in determining anomalous segments
num.segs
: number of segments found by CBS for the given chromosome
chrom.nonanom.mad
: MAD of eligible LRR values in non-anomalous regions across the chromosome
chrom.nonanom.median
: median of eligible LRR values in non-anomalous regions across the chromosome
chrom.nonanom.mean
: mean of eligible LRR values in non-anomalous regions across the chromosome
chrom.nonanom.sd
: standard deviation of eligible LRR values in non-anomalous regions across the chromosome
sex
: sex of the scan id coded as "M" or "F"
raw.adjusted
.
chrom.nonanom.mad
: MAD of eligible LRR values in non-anomalous regions across the chromosome
chrom.nonanom.median
: median of eligible LRR values in non-anomalous regions across the chromosome
chrom.nonanom.mean
: mean of eligible LRR values in non-anomalous regions across the chromosome
chrom.nonanom.sd
: standard deviation of eligible LRR values in non-anomalous regions across the chromosome
sex
: sex of the scan id coded as "M" or "F"
num.runs
: number of original homozygous runs found for given scan/chromosome
num.segs
: number of segments for given scan/chromosome produced by CBS
scanID
: integer id of scan
chromosome
: chromosome as integer code
sex
: sex of the scan id coded as "M" or "F"
scanID
: integer id of scan
chromosome
: chromosome as integer code
left.index
: row index of intenData indicating left endpoint of segment
right.index
: row index of intenData indicating right endpoint of segment
left.base
: base position of left endpoint of segment
right.base
: base position of right endpoint of segment
num.mark
: number of eligible SNP markers in the segment
seg.mean
: mean of eligible LRR values in the segment
sd.fac
: measure of deviation from baseline equal to
abs(mean of segment - baseline mean)/(baseline standard deviation)
where the baseline is over non-anomalous regions
scanID
: integer id of scan
chromosome
: chromosome as integer code
Circular binary segmentation (CBS) (using the R-package DNAcopy)
is applied to LRR values and, in parallel, runs of homozygous or missing genotypes
of a certain minimal size (run.size
) (and allowing for some interruptions
by no more than inter.size
heterozygous SNPs ) are identified. Intervals from
known.anoms
are excluded from the identification of runs.
After some possible merging of consecutive CBS segments
(based on satisfying a threshold merge.fac
for deviation
from non-anomalous baseline), the homozygous runs are intersected
with the segments from CBS.
Determination of anomalous segments is based on
a combination of number-of-marker thresholds and deviation from a non-anomalous
baseline. Segments are declared anomalous if deviation from non-anomalous is above
corresponding thresholds. (See small.num
, small.thresh
, medium.num
,medium.thresh
,
long.num
,long.thresh
,and small.na.thresh
.)
Non-anomalous median and MAD are defined for each sample-chromosome combination.
Intervals from known.anoms
and the homozygous runs
identified are excluded; remaining regions are the non-anomalous baseline.
Deviation from non-anomalous is measured by
a combination of a chromosome-wide 'mad.fac' and a 'local mad.fac' (both the average
and the minimum of these two measures are used).
Here 'mad.fac' is (segment median-non-anomalous median)/(non-anomalous MAD) and
'local mad.fac' is the same definition except the non-anomalous median and MAD
are computed over a window including the segment (see length.factor
).
Median and MADare found for eligible LRR values.
segment
in the package DNAcopy.
segment
and smooth.CNA
in the package DNAcopy,
also findBAFvariance
, anomDetectLOH
library(GWASdata)
data(illuminaScanADF, illuminaSnpADF)
blfile <- system.file("extdata", "illumina_bl.gds", package="GWASdata")
bl <- GdsIntensityReader(blfile)
blData <- IntensityData(bl, scanAnnot=illuminaScanADF, snpAnnot=illuminaSnpADF)
genofile <- system.file("extdata", "illumina_geno.gds", package="GWASdata")
geno <- GdsGenotypeReader(genofile)
genoData <- GenotypeData(geno, scanAnnot=illuminaScanADF, snpAnnot=illuminaSnpADF)
scan.ids <- illuminaScanADF$scanID[1:2]
chrom.ids <- unique(illuminaSnpADF$chromosome)
snp.ids <- illuminaSnpADF$snpID[illuminaSnpADF$missing.n1 < 1]
# example for known.anoms, would get this from anomDetectBAF
known.anoms <- data.frame("scanID"=scan.ids[1],"chromosome"=21,
"left.index"=100,"right.index"=200)
LOH.anom <- anomDetectLOH(blData, genoData, scan.ids=scan.ids,
chrom.ids=chrom.ids, snp.ids=snp.ids, known.anoms=known.anoms)
close(blData)
close(genoData)
Run the code above in your browser using DataLab