Learn R Programming

VariantFiltering (version 1.8.6)

MafDb-class: MafDb class

Description

Class for annotation packages storing minor allele frequency data.

Usage

"snpid2maf"(mafdb, varID) "knownVariantsMAFcols"(mafdb) "keytypes"(x) "keys"(x, keytype) "columns"(x) "select"(x, keys, columns, keytype) "annotateVariants"(annObj, variantsVR, param, BPPARAM=bpparam("SerialParam"))

Arguments

mafdb
A MafDb object.
x
A MafDb object.
varID
A variant identifier, typically a rsxxxx dbSNP identifier.
keytype
the keytype that matches the keys used. For MafDb objects there is at the moment only one type of key which is the variant identifier provided by the original data manufacturer.
keys
the keys to select records from the database. All possible keys are turned by using the keys method.
columns
the columns or kinds of things that can be retrieved from the database. As with keys, all possible columns are returned by using the columns method.
annObj
A MafDb object.
variantsVR
A VRanges object with the variants to annotate.
BPPARAM
An object of class BiocParallelParam specifying parameters related to the parallel execution of this function. See function bpparam() from the BiocParallel package.

Details

The MafDb class and associated methods serve the purpose of creating annotation packages that store minor allele frequency data. Two such annotation packages are:

MafDb.1Kgenomes.phase1.hs37d5
MAF values from the 1000 Genomes Project Phase 1.

MafDb.1Kgenomes.phase3.hs37d5 MAF values from the 1000 Genomes Project Phase 3.

This object class tries to reduce the disk space required to store allele frequencies (AFs) for millions of SNPs by coding AF float values, which range between 0 and 1, into a single-byte raw object type. To achieve this, the original AF values are rounded and coded as follows:

  • AF >= 0.01 & AF <= 1="" 2="" values="" are="" rounded="" to="" digits,="" where="" 0.01,="" ...,="" 0.99,="" 1,="" coded="" as="" raw="" byte="" 100.="" <="" li="">
  • AF >= 0.001 & AF < 0.01 values are rounded to 3 digits, where values 0.001, ..., 0.009 are coded as raw byte values 101 to 109.
  • AF >= 0.0001 & AF < 0.001 are rounded to 4 digits, where values 0.0001, ..., 0.0009 are coded as raw byte values 111 to 119.
  • AF >= 0.00001 & AF < 0.0001 are rounded to 5 digits, where values 0.00001, ..., 0.00009 are coded as raw byte values 121 to 129.
  • AF < 0.00001 are rounded to 6 digits, where values 0, 0.000001, ..., 0.000009 are coded as raw byte values 130 to 139.
  • AF NA values are coded to raw byte value of 255. Note that by default NA values are coded by the raw byte value 0 but this corresponds by default to the null string when raw byte values are coerced into char, which is problematic when storing this data as CHAR values in a SQLite database. This precludes using this original coding of NA values.

A further compression of these data is performed in the cases of variants with mutiple alternative alleles. In those cases, instead of storing the AF of each alternate allele only the maximum AF value is stored.

See Also

MafDb.1Kgenomes.phase1.hs37d5 MafDb.1Kgenomes.phase3.hs37d5

Examples

Run this code

  ## lookup allele frequencies for rs1129038, a SNP associated to blue and brown eye colors
  ## as reported by Eiberg et al. Blue eye color in humans may be caused by a perfectly associated
  ## founder mutation in a regulatory element located within the HERC2 gene inhibiting OCA2 expression.
  ## Human Genetics, 123(2):177-87, 2008 [http://www.ncbi.nlm.nih.gov/pubmed/18172690]

  if (require(MafDb.1Kgenomes.phase3.hs37d5)) {
    mafdb <- MafDb.1Kgenomes.phase3.hs37d5
    mafdb

    ## specialized interface
    knownVariantsMAFcols(mafdb)
    snpid2maf(mafdb, "rs1129038")

    ## standard AnnotationDbi interface
    keytypes(mafdb)
    columns(mafdb)
    select(mafdb, keys="rs1129038", columns=c("varID", "AF"))
  }

Run the code above in your browser using DataLab