VariantFilteringResults-class: VariantFilteringResults class

Description

The VariantFilteringResults class is used to store the kind of object obtained as a result of an analysis using the functions unrelatedIndividuals(), autosomalRecessiveHomozygous(), autosomalRecessiveHeterozygous(), autosomalDominant(), deNovo() and xLinked(). Its purpose is to ease the task of filtering and prioritizing the variants annotated by those functions.

Arguments

Accessors

A VariantFilteringResults has the following set of accessor methods.

: length(x): total number of variants stored internatlly within the VRanges object. Note that this number will be typically larger than the number of variantes in the input VCF object because each of them is copied for each combination of alternate allele, annotated region and sample.
: param(x): returns the VariantFilteringParam input parameter object employed in the call that produced the VariantFilteringResults object x.
: inheritanceModel(x): returns the model of inheritance employed in the call that produced the VariantFilteringResults object x.
: samples(object): active samples from which the current filtered variants were derived. If the x was obtained with unrelatedIndividuals(), then the replace method samples(object)<- can be used to restrict the subset of active samples. In every other case (autosomalDominant(), etc. ) active samples cannot be changed.
: resetSamples(object): set back as active samples the initial set of samples specified in the input parameter object.
: sog(x): Sequence Ontology (SO) graph (actually, an acyclic digraph) returned as a graphNEL object, whose vertices are SO terms, edges represent ontology relationships and vertex attributes vcfIdx and varIdx contain what variants are annotated to each SO term. These annotations can be directly retrieved from the SO graph with the nodeData() function from the graph package. The summary() function described in this manual page allows one to tally the number of variants in each SO term throughout the entire SO hierarchy.
: bamFiles(x): access and update the BamViews object containing references to BAM files from which the input VCF files were derived. Initially this is empty.
: allVariants(x, groupBy="sample"): returns a VRangesList object with all variants grouped by default by sample. Using the argument groupBy we can specify any metadata column to be used to group variants. If the value given to groupBy does not correspond to any such columns, a VRanges object with all variants together is returned.
: filteredVariants(x, groupBy="sample"): it works like allVariants(x) but instead of returning all variants, it returns only those who pass the active filters; see filters() and cutoffs() below.

Filters and cutoffs

The variants contained in a VariantFilteringResults object can be filtered using the FilterRules mechanism, defined in the S4Vectors package, by using the functions filters() and cutoffs() described below. There are additional functions, also described in this section, to facilitate this task on the set of core annotations provided by VariantFiltering.

: filters(x): get the current FilterRules object that defines the available set of filter criteria that one can use to filter the variants contained in x. This can also be used as a replacement function filters(x)<- to update this set of filters. The actual filtering is done when calling the function filteredVariants().
: cutoffs(x): get and update cutoffs from the available filters.
: softFilterMatrix(x): get and update the variant by filter matrix; see softFilterMatrix() in the VariantAnnotation package.
: dbSNPpresent(x): flag whether to filter variants present or absent from dbSNP (NA -do not filter-, "Yes", "No").
: variantType(x): filter by type of variant ( "SNV", "Insertion", "Deletion", "MNV", "Delins").
: variantLocation(x): filter by variant location ("coding", "intron", "threeUTR", "fiveUTR", "intergenic", "spliceSite", "promoter").
: variantConsequence(x): filter by variant consequence ("snynonymous", "nonsynonymous", "frameshift", "nonsense", "not translated").
: aaChangeType(x): filter by type of change of amino acid ("Any", "Radical", "Conservative").
: OMIMpresent(x): flag whether to filter variants whose associated genes are present or absent from OMIM (NA -do not filter-, "Yes", "No").
: naMAF(x): flag whether NA maximum MAF values should be included in the filtered variants.
: maxMAF(x): maximum MAF value that a variant may meet among the selected populations.
: minPhastCons(x): minimum phastCons score for nucleotide conservation (NA -do not filter-, [0-1]).
: minPhylostratum(x): minimum phylostratum for gene conservation (NA -do not filter-, [1-20]).
: MAFpop(x): selection of populations to use when filtering by maximum MAF value.
: minScore5ss(x): minimum weight matrix score on a cryptic 5'ss. NA indicates this filter is not applied.
: minScore3ss(x): minimum weight matrix score on a cryptic 3'ss. NA indicates this filter is not applied.
: minCUFC(x): minimum absolute codon-usage log2 fold-change.

Summarization, visualization and reporting

The following functions help in summarizing, visualizing and reporting the fiiltered variants.

: summary(object, method=c("SO", "SOfull", "bioc")): tally the current filtered set of variants to features. By default, features are Sequence Ontology (SO) terms to which variants are annotated by VariantFiltering. The method argument allows the user to change this default setting to tallying throughout the entire SO hierarchy. Both options, SO and SOfull can be used in combination with the cutoff SOterms; see the vignette. The option method="bioc" considers as features the regions and consequences annotated by functions locateVariants() and predictCoding() from the VariantAnnotation package. The result is returned as a data.frame object.
: plot(x, what, sampleName, flankingNt=20, showAlnNtCutoff=200, isPaired=FALSE, ...): Plot variants using the Gviz package. The argument what can be either a character vector specifying gene or variant identifiers or a chromosome name, or a GRanges object specifying a genomic region. The argument sampleName is optional and allows the user to plot the aligned reads and coverage from a specific sample, located in the plotted region, when the corresponding BAM file has been linked to the object with bamFiles(). The argument flankingNt is a number of nucleotides to extend the plotting region derived from the argument what. The argument showAlnNtCutoff is the region size cutoff below which it will be attempted to plot the aligned reads. The argument isPaired is passed directly to the Gviz function AlignmentsTrack() which streams over the BAM file to plot the reads and sets whether the BAM file contains single (default) or paired-end reads. Further arguments in ... are passed to the Gviz function plotTracks() and can be used to fine-tune the final plot; see the vignette of Gviz to find out what these arguments are.
: reportVariants(x, type=c("shiny", "csv", "tsv"), file=NULL): Builds a report from the VariantFilteringResult object x. Using the type argument, the report can take the form of a flat file in CSV or TSV format or a web shiny app (default) that enables applying functional annotation filters in an interactive manner. When the shiny app is closed this method returns a VariantFilteringResult object with the corresponding filters switched on or off according to how the app has been interactively used.

Details

Variants are stored within a VariantFilteringResults object using a VRanges object, which also holds the variant annotations in its metadata columns. VariantFiltering adds the following core set of annotations.

LOCATION: Region where the variant is located (coding, intronic, splice site, promoter, ...) as given by the function locateVariants() from the VariantAnnotation package.
LOCSTART: Start position of the variant within the region defined by the LOCATION annotation.
GENEID: Gene identifier derived with the transcript-centric annotation package given in the txdb argument of the VariantFilteringParam() function, typically an Entrez Gene identifier.
GENE: Gene name given by HGNC derived with the gene-centric annotation package given in the orgdb argument of the VariantFilteringParam() function.
TYPE: Type of variant, either a single nucleotide variant (SNV), an insertion, a deletion, a multinucleotide variant (MNV) or a deletion followed by an insertion (Delins). These types are determined using functions isSNV(), isInsertion(), isDeletion(), isSubstitution() and isDelins() from the VariantAnnotation package.
dbSNP: dbSNP identifier derived by position from the annotation packages given in the snpdb argument of the VariantFilteringParam() function.
cDNALOC: Location of the variant along the processed transcript, when the variant belongs to an exonic region.
CONSEQUENCE: Consequence of the variant when located in the coding region (synonymous, nonsynonymous, missense, nonsense o frameshift) as given by the function predictCoding() from the VariantAnnotation package.
TXNAME: Transcript name extracted from the TxDb annotation package given by the txdb argument of the VariantFilteringParam() function.
HGVSg: HGVS description of the variant at genomic level.
HGVSc: HGVS description of the variant at coding level.
HGVSp: HGVS description of the variant at protein level.
OMIM: OMIM identifier of the gene associated to the variant derived with the gene-centric annotation package given in the orgdb argument of the VariantFilteringParam() function.
AAchangeType: In the case of coding variants, whether the amino acid change is conservative or radical according to the matrix of amino acid biochemical properties given in the argument radicalAAchangeFilename of the VariantFilteringParam() function.
SCORE5ssREF: Score for the cryptic 5'ss for the REF allele respect to the ALT allele.
SCORE5ssALT: Maximum score for a potential cryptic 5'ss created by the ALT allele.
SCORE5ssPOS: Position of the allele respect to the position of the dinucleotide GT, considering those as positions 1 and 2.
SCORE3ssREF: Score for the cryptic 3'ss for the REF allele respect to the ALT allele.
SCORE3ssALT: Maximum score for a potential cryptic 3'ss created by the ALT allele.
SCORE3ssPOS: Position of the allele respect to the position of the dinucleotide AG, considering those as positions 1 and 2.

Examples

Run this code

## Not run: 
# library(VariantFiltering)
# 
# CEUvcf <- file.path(system.file("extdata", package="VariantFiltering"),
#                     "CEUtrio.vcf.gz")
# CEUped <- file.path(system.file("extdata", package="VariantFiltering"),
#                     "CEUtrio.ped")
# param <- VariantFilteringParam(vcfFileNames=CEUvcf, pedFileName=CEUped)
# reHo <- autosomalRecessiveHomozygous(param)
# naMAF(reHo) <- FALSE
# maxMAF(reHo) <- 0.05
# reHo
# head(filteredVariants(reHo))
# reportVariants(reHo, type="csv", file="reHo.csv")
# ## End(Not run)

Run the code above in your browser using DataLab