Learn R Programming

GWASTools (version 1.18.0)

anomSegStats: Calculate LRR and BAF statistics for anomalous segments

Description

Calculate LRR and BAF statistics for anomalous segments and plot results

Usage

anomSegStats(intenData, genoData, snp.ids, anom, centromere, lrr.cut = -2, verbose = TRUE)
anomStatsPlot(intenData, genoData, anom.stats, snp.ineligible, plot.ineligible = FALSE, centromere = NULL, brackets = c("none", "bases", "markers"), brkpt.pct = 10, whole.chrom = FALSE, win = 5, win.calc = FALSE, win.fixed = 1, zoom = c("both", "left", "right"), main = NULL, info = NULL, ideogram = TRUE, ideo.zoom = FALSE, ideo.rect = TRUE, mult.anom = FALSE, cex = 0.5, cex.leg = 1.5, colors = c("default", "neon", "primary"), ...)

Arguments

intenData
An IntensityData object containing BAlleleFreq and LogRRatio. The order of the rows of intenData and the snp annotation are expected to be by chromosome and then by position within chromosome.
genoData
A GenotypeData object. The order of the rows of intenData and the snp annotation are expected to be by chromosome and then by position within chromosome.
snp.ids
vector of eligible SNP ids. Usually exclude failed and intensity-only SNPS. Also recommended to exclude an HLA region on chromosome 6 and XTR region on X chromosome. See HLA and pseudoautosomal. If there are SNPs annotated in the centromere gap, exclude these as well (see centromeres). x
anom
data.frame of detected chromosome anomalies. Names must include "scanID", "chromosome", "left.index", "right.index", "sex", "method", "anom.id". Valid values for "method" are "BAF" or "LOH" referring to whether the anomaly was detected by BAF method (anomDetectBAF) or by LOH method (anomDetectLOH). Here "left.index" and "right.index" are row indices of intenData with left.index < right.index.
centromere
data.frame with centromere position info. Names must include "chrom", "left.base", "right.base". Valid values for "chrom" are 1:22, "X", "Y", "XY". Here "left.base" and "right.base" are start and end base positions of the centromere location, respectively. Centromere data tables are provided in centromeres.
lrr.cut
count the number of eligible LRR values less than lrr.cut
verbose
whether to print the scan id currently being processed
anom.stats
data.frame of chromosome anomalies with statistics, usually the output of anomSegStats. Names must include "anom.id", "scanID", "chromosome", "left.index", "right.index", "method", "nmark.all", "nmark.elig", "left.base", "right.base", "nbase", "non.anom.baf.med", "non.anom.lrr.med", "anom.baf.dev.med", "anom.baf.dev.5", "anom.lrr.med", "nmark.baf", "nmark.lrr". Left and right refer to start and end, respectively, of the anomaly, in position order.
snp.ineligible
vector of ineligible snp ids (e.g., intensity-only, failed snps, XTR and HLA regions). See HLA and pseudoautosomal.
plot.ineligible
whether or not to include ineligible points in the plot for LogRRatio
brackets
type of brackets to plot around breakpoints - none, use base length, use number of markers (note that using markers give asymmetric brackets); could be used, along with brkpt.pct, to assess general accuracy of end points of the anomaly
brkpt.pct
percent of anomaly length in bases (or number of markers) for width of brackets
whole.chrom
logical to plot the whole chromosome or not (overrides win and zoom)
win
size of the window (a multiple of anomaly length) surrounding the anomaly to plot
win.calc
logical to calculate window size from anomaly length; overrides win and gives window of fixed length given by win.fixed
win.fixed
number of megabases for window size when win.calc=TRUE
zoom
indicates whether plot includes the whole anomaly ("both") or zooms on just the left or right breakpoint; "both" is default
main
Vector of titles for upper (LRR) plots. If NULL, titles will include anom.id, scanID, sex, chromosome, and detection method.
info
character vector of extra information to include in the main title of the upper (LRR) plot
ideogram
logical for whether to plot a chromosome ideogram under the BAF and LRR plots.
ideo.zoom
logical for whether to zoom in on the ideogram to match the range of the BAF/LRR plots
ideo.rect
logical for whether to draw a rectangle on the ideogram indicating the range of the BAF/LRR plots
mult.anom
logical for whether to plot multiple anomalies from the same scan-chromosome pair on a single plot. If FALSE (default), each anomaly is shown on a separate plot.
cex
cex value for points on the plots
cex.leg
cex value for the ideogram legend
colors
Color scheme to use for genotypes. "default" is colorblind safe (colorbrewer Set2), "neon" is bright orange/green/fuschia, and "primary" is red/green/blue.
...
Other parameters to be passed directly to plot.

Value

anomSegStats produces a data.frame with the variables for anom plus the following columns: Left and right refer to position order with left < right.
nmark.all
total number of SNP markers on the array from left.index to right.index inclusive
nmark.elig
total number of eligible SNP markers on the array from left.index to right.index, inclusive. See snp.ids for definition of eligible SNP markers.
left.base
base position corresponding to left.index
right.base
base position corresponding to right.index
nbase
number of bases from left.index to right.index, inclusive
non.anom.baf.med
BAF median of non-anomalous segments on all autosomes for the associated sample, using eligible heterozygous or missing SNP markers
non.anom.lrr.med
LRR median of non-anomalous segments on all autosomes for the associated sample, using eligible SNP markers
non.anom.lrr.mad
MAD for LRR of non-anomalous segments on all autosomes for the associated sample, using eligible SNP markers
anom.baf.dev.med
BAF median of deviations from non.anom.baf.med of points used to detect anomaly (eligible and heterozygous or missing)
anom.baf.dev.5
median of BAF deviations from 0.5, using eligible heterozygous or missing SNP markers in anomaly
anom.baf.dev.mean
mean of BAF deviations from non.anom.baf.med, using eligible heterozygous or missing SNP markers in anomaly
anom.baf.sd
standard deviation of BAF deviations from non.anom.baf.med, using eligible heterozygous or missing SNP markers in anomaly
anom.baf.mad
MAD of BAF deviations from non.anom.baf.med, using eligible heterozygous or missing SNP markers in anomaly
anom.lrr.med
LRR median of eligible SNP markers within the anomaly
anom.lrr.sd
standard deviation of LRR for eligible SNP markers within the anomaly
anom.lrr.mad
MAD of LRR for eligible SNP markers within the anomaly
nmark.baf
number of SNP markers within the anomaly eligible for BAF detection (eligible markers that are heterozygous or missing)
nmark.lrr
number of SNP markers within the anomaly eligible for LOH detection (eligible markers)
cent.rel
position relative to centromere - left, right, span
left.most
T/F for whether the anomaly is the left-most anomaly for this sample-chromosome, i.e. no other anomalies with smaller start base position
right.most
T/F whether the anomaly is the right-most anomaly for this sample-chromosome, i.e. no other anomalies with larger end base position
left.last.elig
T/F for whether the anomaly contains the last eligible SNP marker going to the left (decreasing position)
right.last.elig
T/F for whether the anomaly contains the last eligible SNP marker going to the right (increasing position)
left.term.lrr.med
median of LRR for all eligible SNP markers from left-most eligible marker to the left telomere (only calculated for the most distal anom)
right.term.lrr.med
median of LRR for all eligible markers from right-most eligible marker to the right telomere (only calculated for the most distal anom)
left.term.lrr.n
sample size for calculating left.term.lrr.med
right.term.lrr.n
sample size for calculating right.term.lrr.med
cent.span.left.elig.n
number of eligible markers on the left side of centromere-spanning anomalies
cent.span.right.elig.n
number of eligible markers on the right side of centromere-spanning anomalies
cent.span.left.bases
length of anomaly (in bases) covered by eligible markers on the left side of the centromere
cent.span.right.bases
length of anomaly (in bases) covered by eligible markers on the right side of the centromere
cent.span.left.index
index of eligible marker left-adjacent to centromere; recall that index refers to row indices of intenData
cent.span.right.index
index of elig marker right-adjacent to centromere
bafmetric.anom.mean
mean of BAF-metric values within anomaly, using eligible heterozygous or missing SNP markers BAF-metric values were used in the detection of anomalies. See anomDetectBAF for definition of BAF-metric
bafmetric.non.anom.mean
mean of BAF-metric values within non-anomalous segments across all autosomes for the associated sample, using eligible heterozygous or missing SNP markers
bafmetric.non.anom.sd
standard deviation of BAF-metric values within non-anomalous segments across all autosomes for the associated sample, using eligible heterozygous or missing SNP markers
nmark.lrr.low
number of eligible markers within anomaly with LRR values less than lrr.cut

Details

anomSegStats computes various statistics of the input anomalies. Some of these are basic statistics for the characteristics of the anomaly and for measuring deviation of LRR or BAF from expected. Other statistics are used in downstrean quality control analysis, including detecting terminal anomalies and investigating centromere-spanning anomalies.

anomStatsPlot produces separate png images of each anomaly in anom.stats. Each image consists of an upper plot of LogRRatio values and a lower plot of BAlleleFrequency values for a zoomed region around the anomaly or whole chromosome (depending up parameter choices). Each plot has vertical lines demarcating the anomaly and horizontal lines displaying certain statistics from anomSegStats. The upper plot title includes sample number and chromosome. Further plot annotation describes which anomaly statistics are represented.

See Also

anomDetectBAF, anomDetectLOH

Examples

Run this code
library(GWASdata)
data(illuminaScanADF, illuminaSnpADF)

blfile <- system.file("extdata", "illumina_bl.gds", package="GWASdata")
bl <- GdsIntensityReader(blfile)
blData <-  IntensityData(bl, scanAnnot=illuminaScanADF, snpAnnot=illuminaSnpADF)

genofile <- system.file("extdata", "illumina_geno.gds", package="GWASdata")
geno <- GdsGenotypeReader(genofile)
genoData <-  GenotypeData(geno, scanAnnot=illuminaScanADF, snpAnnot=illuminaSnpADF)

scan.ids <- illuminaScanADF$scanID[1:2]
chrom.ids <- unique(illuminaSnpADF$chromosome)
snp.ids <- illuminaSnpADF$snpID[illuminaSnpADF$missing.n1 < 1]
snp.failed <- illuminaSnpADF$snpID[illuminaSnpADF$missing.n1 == 1]

# example results from anomDetectBAF
baf.anoms <- data.frame("scanID"=rep(scan.ids[1],2), "chromosome"=rep(21,2),
  "left.index"=c(100,300), "right.index"=c(200,400), sex=rep("M",2),
  method=rep("BAF",2), anom.id=1:2, stringsAsFactors=FALSE)

# example results from anomDetectLOH
loh.anoms <- data.frame("scanID"=scan.ids[2],"chromosome"=22,
  "left.index"=400,"right.index"=500, sex="F", method="LOH",
  anom.id=3, stringsAsFactors=FALSE)

anoms <- rbind(baf.anoms, loh.anoms)
data(centromeres.hg18)
stats <- anomSegStats(blData, genoData, snp.ids=snp.ids, anom=anoms,
  centromere=centromeres.hg18)

anomStatsPlot(blData, genoData, anom.stats=stats,
  snp.ineligible=snp.failed, centromere=centromeres.hg18)

close(blData)
close(genoData)

Run the code above in your browser using DataLab