gbRecord: Read a GenBank/GenPept or Embl format file.

Description

Import data from GenBank/GenPept, Embl, or IMGT/HLA flat files into R, represented as an instance of the '>gbRecord or '>gbRecordList classes.

Usage

gbRecord(rcd, progress = FALSE)

Arguments

rcd

A vector of paths to GenBank/Embl format records, an efetch object containing GenBank record(s), or a textConnection to a character vector that can be parsed as a Genbank or Embl record.

progress

Print a nice progress bar if parsing multiple Genbank records. (This will not work if you process the records in parallel.)

Value

An instance of the '>gbRecord or '>gbRecordList classes.

Details

For a sample GenBank record see https://www.ncbi.nlm.nih.gov/Sitemap/samplerecord.html, for a detailed description of the GenBank feature table format see https://www.ncbi.nlm.nih.gov/collab/FT/.

For a description of the EMBL flat file format see ftp://ftp.ebi.ac.uk/pub/databases/embl/doc/usrman.txt.

For a description of the format and conventions of IMGT/HLA flat files see https://www.ebi.ac.uk/ipd/imgt/hla/docs/manual.html.

Examples

Run this code

# NOT RUN {
### import from file
gbk_file <- system.file("extdata", "marine_metagenome.gb", package = "biofiles")
x <- gbRecord(gbk_file)
# }
# NOT RUN {
load(system.file("extdata", "marine_metagenome.rda", package = "biofiles"))
getHeader(x)
getFeatures(x)

### quickly extract features as GRanges
ranges(x["CDS"], include = c("product", "note", "protein_id"))

## Directly subset features
x[[1]]

### import directly from NCBI
# }
# NOT RUN {
x <- gbRecord(reutils::efetch("139189709", "protein", rettype = "gp", retmode = "text"))
x
# }
# NOT RUN {
## import a file containing multiple GenBank records as a
## gbRecordList. With many short records it pays of to
## run the parsing in parallel
# }
# NOT RUN {
gss_file <- system.file("extdata", "gss.gb", package = "biofiles")
library(doParallel)
registerDoParallel(cores = 4)
gss <- gbRecord(gss_file)
gss
# }