Learn R Programming

BoSSA (version 1.2)

InfoGenBank: Download sequence information from GenBank

Description

This function is designed to work with virus accession number. It download from genbank a gbfile (INSDSeq XML) and parse it for different information, such as the year of isolation, host, sampling location...

Usage

InfoGenBank(X,tsleep=3)

Arguments

X
a vector of accession numbers
tsleep
the time between two query to genbank. set to 3 (the unit is the second) as asked by GenBank

Value

The output is a tab separated table ready to be write on the disk. It include 18 colomns with the accession number, the organism, the isolate name,the taxonomy, the submission date of the sequence, the sampling date, the host, the host taxonomy ID in GenBank, the host family, the host genus, the host subgenus, the host name proposition in case of possible misspel, the sampling location, the GPS coordinates of the sampling location, the authors, the title, the journal in which it was published and a pubmed URL to the publication. Note that if the information is missing, the cell is left empty.

Details

Require the R4X package (available at http://r-forge.r-project.org/projects/r4x/) and internet connexion.

See Also

read.GenBank, TaxoGB

Examples

Run this code
#require internet connexion
#require R4X package, available at http://r-forge.r-project.org/projects/r4x/
#accnb <- c("AJ86539","AJ865337")
#InfoGenBank(accnb)

Run the code above in your browser using DataLab