Learn R Programming

micropan (version 2.1)

getAccessions: Collecting contig accession numbers

Description

Retrieving the accession numbers for all contigs from a master record GenBank file.

Usage

getAccessions(master.record.accession, chunk.size = 99)

Arguments

master.record.accession

The accession number (single text) to a master record GenBank file having the WGS entry specifying the accession numbers to all contigs of the WGS genome.

chunk.size

The maximum number of accession numbers returned in one text.

Value

A character vector where each element is a text listing the accession numbers separated by comma. Each vector element will contain no more than chunk.size accession numbers, see entrezDownload for details on this. The vector returned by getAccessions is typically used as input to entrezDownload.

Details

In order to download a WGS genome (draft genome) using entrezDownload you will need the accession number of every contig. This is found in the master record GenBank file, which is available for every WGS genome. getAccessions will extract these from the GenBank file and return them in the apropriate way to be used by entrezDownload.

The download API at NCBI will not tolerate too many accessions per query, and for this reason you need to split the accessions for many contigs into several texts using chunk.size.

See Also

entrezDownload.

Examples

Run this code
# NOT RUN {
# The master record accession for the WGS genome Mycoplasma genitalium, strain G37
acc <- getAccessions("AAGX00000000")
# Then we use this to download all contigs and save them
genome.file <- tempfile(fileext = ".fna")
txt <- entrezDownload(acc, out.file = genome.file)

# ...cleaning...
ok <- file.remove(genome.file)
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab