Learn R Programming

genotypeeval (version 1.2.2)

QA/QC of a gVCF or VCF file

Description

Takes in a gVCF or VCF and reports metrics to assess quality of calls.

Copy Link

Version

Version

1.2.2

License

file LICENSE

Maintainer

Jennifer Tom

Last Published

February 15th, 2017

Functions in genotypeeval (1.2.2)

GoldDataParam-class

Declare class GoldDataParam which will store thresholds to apply to VCFEvaluate object. This is intended for use in batch mode when a large number of vcf files needs to be screened and individual vcf files that fail flagged. All limits follow the format lower limit than upper limit
ReadGoldData

User Constructor for class
GoldDataParam

User Constructor for class
VCFQAReport-class

Declare class VCFQAReport which will evaluate a VCF stored as a ReadData object.
rareCompare

Comparator to rare variants. Rare is defined as 0.01 percent or less
homrefPlot

Dot plot of variant call counts (hom ref) by chromosome
hetsMasked

Number hets in masked GRanges
percentHets

Percent of Hets as Total number of variants
readdepthPlot

Histogram of read depth by GT
readVcfGold

Private method for class. Read in Gold file - will read in the AF if it is detected in header
VCFQAParam

User Constructor for class. Call limits are set as default to pass.
getPlots

Getter for VCFQAReport class to return plots slot.
numberCalls

Total Calls. This is the total number of calls in the file (including MULTIs so hom ref, hom var and hom alt might not add up). These methods are private. Users are not expected to provide the number of hom ref and number of duplicate calls - these functions are generally called through the function VCFEvaluate
numberOfHets

Count Number of Hets
computeTiTv

Private function to calc transition tranversion (titv) ratio
VCFQAParam-class

Declare class VCFQAParam which will store thresholds to apply to VCFEvaluate object. This is intended for use in batch mode when a large number of vcf files needs to be screened and individual vcf files that fail flagged. All limits follow the format lower limit than upper limit
callbyChrPlot

Dot plot of variant call counts (hom alt and het) by chromosome
getCoefs

Private function to calc coefficients for admixture
titv

Transition transversion ratio in coding and non-coding
didSamplePassOverall

Getter for VCFEvaluate class to check if Sample Passed. Using thresholds from VCFQAParam object return a list. First return whether each test was passed (TRUE) or failed (FALSE). Then return an overall pass (TRUE) or fail (FALSE).
numberOfHomVars

Count Number of Hom Vars
GoldDataFromGRanges

User Constructor for class. Used to associate the gold params object with the gold granges and to check if MAF is present.
VCFData-class

Declare class Reads in VCF using readVCFAsVRanges
ReadVCFData

User Constructor for class. Calls VCFData constructor: ReadVCFData is a wrapper for readVcfAsVRanges. It removes indels, GL chromosomes, and MULTI calls. It scans the header of the vcf file and adds in the following fields for analysis if present: AD, GT, DP, GQ. Looks for the "END" tag in the header and reads in file as gVCF if necessary.
percentInTarget

percent in target range read depth For 15 to 60 for 30x (50 percent to 200 percent)
myf

Private function to calc likelihood for admixture
numberOfHomRefs

Count Number of Hom Ref
VCFEvaluate

Constructor for class. Calls constructor for class. Using the GENO fields present in the vcf header will evaluate the vcf file using metrics and generate plots. Each metric will be tested against the params specified in the params class. For example, if Read Depth is in the GENO header will calculate median read depth, percent in target (50 percent to 200 percent of the target specified in the params file) and generate a histogram of Read Depth.
getVR

getVr is a Getter. Returns vr slot.
calltypePlot

Bar plot of variants (counts)
ReadVCFDataChunk

User Constructor for class. Calls VCFData constructor: ReadVCFDataChunk is a wrapper for readVcfAsVRanges. It removes indels, GL chromosomes, and MULTI calls. It scans the header of the vcf file and adds in the following fields for analysis if present: AD, GT, DP, GQ. Looks for the "END" tag in the header and reads in file as gVCF if necessary. This is a multi core version of readVCFData. Note, input file must have been zipped and have a corresponding tabix file. It will drop all hom ref sites not in the admixture file but retain the counts of homref and multi in the VCF file. This means that a few of the metrics and the hom ref plot can no longer be calculated in VCFQAReport. If the metrics can no longer be calculated, it will not be output. Please note that if using a filter on the data (eg gq.filter) this will not be applied to the hom ref and total number of calls. The filter is applied in the VCFQAReport step and the metrics number of hom ref and total number of calls is calculated while reading in the file. When calling this function keep in mind the memory requirements. For example, if numcores=6, then when submitting the job you may request 12 Gb each core (72 Gb total). However the VCF in memory will need to fit back onto a single core or else R will not be able to allocate the memory. The given example here does not make sense to run as it includes only chromosome 22.
meanGQ

Mean Genotype Quality (GQ)
admixture

admixture - estimate admixture components using supervised ADMIXTURE algorithm.
hetGap

Gap between HETs by Chromosome
GoldData-class

Declare class Gold to store information from Gold" (1000 Genomes for example) along with the GoldDataParam
didSamplePass

Getter for VCFEvaluate class to check if Sample Passed. Using thresholds from VCFQAParam object return a list. First return whether each test was passed (TRUE) or failed (FALSE). Then return an overall pass (TRUE) or fail (FALSE).
goldCompare

Comparator to gold standard
chunkData

chunkData is a private function to read in a chunk and process it. This is a private function and is not meant to be called by the user. An example is provided in line with bioconductor policies.
genotypeQualityPlot

Histogram of genotype qualities
reformatData

Take in the results from the population data and re-format it
readDepth

Median read depth
getResults

Getter for VCFQAReport class to return results. Return a list showing values that the sample was evaluated on.
getName

Getter for VCFQAReport class to return filename slot