Learn R Programming

WhopGenome (version 0.9.4)

WhopGenome-package: High-speed, high-specialisation population-scale whole-genome variation and sequence data access

Description

WhopGenome provides read access to Variant Call Format files with maximum speed by means of C functions with many specialised output formats and a configurable filtering engine. Allows indexing of FASTA files and any file format using tab-separated columns, such as GFF, VCF and METAL, in preparation to high-speed access. Can read specified subsections of indexed FASTA files very fast. It also provides many easy-to-use methods to access the UCSC Genome Browser SQL servers, the AmiGO gene ontology databases, PLINK .PED files and Bioconductor's organism annotation databases.

Arguments

Details

Package:
WhopGenome
Type:
Package
Version:
1.0
Date:
2013-01-24
License:
GPL-2
- Open a VCF file with handle <- vcf_open("filename") - Set a region of interest (chromosome/contig ID,start position, end position) with vcf_setregion(handle,"X",200000, 300000 ) - Select (in this case the first 10) samples of interest: vcf_selectsamples( handle, vcf_getSampleNames(handle)[1:10] ) - Read from the file via resvec <- vcf_readLineVec(handle)

References

The 1000 Genomes Project http://1000genomes.org/

The 1000 Genomes Project Consortium (2010), A map of human genome variation from population-scale sequencing. Nature *467*, 1061-1073.

Heng Li (2011), Tabix: Fast retrieval of sequence features from generic TAB-delimited files, Bioinformatics, doi: 10.1093/bioinformatics/btq671

The Variant Call Format http://www.1000genomes.org/wiki/Analysis/Variant%20Call%20Format/vcf-variant-call-format-version-41

Examples

Run this code
#vcfh <- .Call("VCF_open","/data/vcf/1000g/ALL.Chromosome1.consensus.vcf.gz",PACKAGE="WhopGenome")

Run the code above in your browser using DataLab