Learn R Programming

GWASTools (version 1.18.0)

findBAFvariance: Find chromosomal areas with high BAlleleFreq (or LogRRatio) standard deviation

Description

sdByScanChromWindow uses a sliding window algorithm to calculate the standard deviation of the BAlleleFreq (or LogRRatio) values for a user specified number of bins across each chromosome of each scan.

medianSdOverAutosomes calculates the median of the BAlleleFreq (or LogRRatio) standard deviation over all autosomes for each scan.

meanSdByChromWindow calculates the mean and standard deviation of the BAlleleFreq standard deviation in each window in each chromosome over all scans. findBAFvariance flags chromosomal areas with high BAlleleFreq standard deviation using previously calculated means and standard deviations over scans, typically results from sdByScanChromWindow.

Usage

sdByScanChromWindow(intenData, genoData=NULL, var="BAlleleFreq", nbins=NULL, snp.exclude=NULL, return.mean=FALSE, incl.miss=TRUE, incl.het=TRUE, incl.hom=FALSE)
medianSdOverAutosomes(sd.by.scan.chrom.window)
meanSdByChromWindow(sd.by.scan.chrom.window, sex)
findBAFvariance(sd.by.chrom.window, sd.by.scan.chrom.window, sex, sd.threshold)

Arguments

intenData
A IntensityData object. The order of SNPs is expected to be by chromosome and then by position within chromosome.
genoData
A GenotypeData object. May be omitted if incl.miss, incl.het, and incl.hom are all TRUE, as there is no need to distinguish between genotype calls in that case.
var
The variable for which to calculate standard deviations, typically "BAlleleFreq" (the default) or "LogRRatio."
nbins
A vector with integers corresponding to the number of bins for each chromosome. The values all must be even integers.
snp.exclude
An integer vector containing the snpIDs of SNPs to be excluded.
return.mean
a logical. If TRUE, return mean as well as standard deviation.
incl.miss
a logical. If TRUE, include SNPs with missing genotype calls.
incl.het
a logical. If TRUE, include SNPs called as heterozygotes.
incl.hom
a logical. If TRUE, include SNPs called as homozygotes. This is typically FALSE (the default) for BAlleleFreq calculations.
sd.by.scan.chrom.window
A list of matrices of standard deviation for each chromosome, with dimensions of number of scans x number of windows. This is typically the output of sdByScanChromWindow.
sd.by.chrom.window
A list of matrices of the standard deviations, as generated by meanSdByChromWindow.
sex
A character vector of sex ("M"/"F") for the scans.
sd.threshold
A value specifying the threshold for the number of standard deviations above the mean at which to flag.

Value

sdByScanChromWindow returns a list of matrices containing standard deviations. There is a matrix for each chromosome, with each matrix having dimensions of number of scans x number of windows. If return.mean=TRUE, two lists to matrices are returned, one with standard deviations and one with means.medianSdOverAutosomes returns a data frame with colums "scanID" and "med.sd" containing the median standard deviations over all autosomes for each scan.meanSdByChromWindow returns a list of matrices, one for each chromosome. Each matrix contains two columns called "Mean" and "SD", containing the mean and SD of the BAlleleFreq standard devations over scans for each bin. For the X chromosome the matrix has four columns "Female Mean", "Male Mean", "Female SD" and "Male SD".findBAFvariance returns a matrix with columns "scanID", "chromosome", "bin", and "sex" containing those scan by chromosome combinations with BAlleleFreq standard deviations greater than those specified by sd.threshold.

Details

sdByScanChromWindow calculates the standard deviation of BAlleleFreq (or LogRRatio) values across chromosomes 1-22 and chromosome X for a specified number of 'bins' in each chromosome as passed to the function in the 'nbins' argument. The standard deviation is calculated using windows of width equal to 2 bins, and moves along the chromosome by an offset of 1 bin (or half a window). Thus, there will be a total of nbins-1 windows per chromosome. If nbins=NULL (the default), there will be 2 bins (one window) for each chromosome.

medianSdOverAutosomes calulates the median over autosomes of BAlleleFreq (or LogRRatio) standard deviations calculated for sliding windows within each chromosome of each scan. The standard deviations should be a list with one element for each chromosome, and each element consisting of a matrix with scans as rows.

meanSdByChromWindow calculates the mean and standard deviation over scans of BAlleleFreq standard deviations calculated for sliding windows within each chromosome of each scan. The BAlleleFreq standard deviations should be a list with one element for each chromosome, and each element consisting of a matrix containing the BAlleleFreq standard deviation for the i'th scan in the j'th bin. This is typically created using the sdByScanChromWindow function. For the X chromosome the calculations are separated out by sex. findBAFvariance determines which chromosomes of which scans have regions which are at least a given number of SDs from the mean, using BAlleleFreq means and standard deviations calculated from sliding windows over each chromosome by scan.

See Also

IntensityData, GenotypeData, BAFfromClusterMeans, BAFfromGenotypes

Examples

Run this code
library(GWASdata)
data(illuminaScanADF)

blfile <- system.file("extdata", "illumina_bl.gds", package="GWASdata")
bl <- GdsIntensityReader(blfile)
blData <-  IntensityData(bl, scanAnnot=illuminaScanADF)

genofile <- system.file("extdata", "illumina_geno.gds", package="GWASdata")
geno <- GdsGenotypeReader(genofile)
genoData <-  GenotypeData(geno, scanAnnot=illuminaScanADF)

nbins <- rep(8, 3) # need bins for chromosomes 21,22,23
baf.sd <- sdByScanChromWindow(blData, genoData, nbins=nbins)

close(blData)
close(genoData)
med.res <- medianSdOverAutosomes(baf.sd)

sex <- illuminaScanADF$sex
sd.res <- meanSdByChromWindow(baf.sd, sex)

var.res <- findBAFvariance(sd.res, baf.sd, sex, sd.threshold=2)

Run the code above in your browser using DataLab