Learn R Programming

micropan (version 2.1)

entrezDownload: Downloading genome data

Description

Retrieving genomes from NCBI using the Entrez programming utilities.

Usage

entrezDownload(accession, out.file, verbose = TRUE)

Arguments

accession

A character vector containing a set of valid accession numbers at the NCBI Nucleotide database.

out.file

Name of the file where downloaded sequences should be written in FASTA format.

verbose

Logical indicating if textual output should be given during execution, to monitor the download progress.

Value

The name of the resulting FASTA file is returned (same as out.file), but the real result of this function is the creation of the file itself.

Details

The Entrez programming utilities is a toolset for automatic download of data from the NCBI databases, see E-utilities Quick Start for details. entrezDownload can be used to download genomes from the NCBI Nucleotide database through these utilities.

The argument accession must be a set of valid accession numbers at NCBI Nucleotide, typically all accession numbers related to a genome (chromosomes, plasmids, contigs, etc). For completed genomes, where the number of sequences is low, accession is typically a single text listing all accession numbers separated by commas. In the case of some draft genomes having a large number of contigs, the accession numbers must be split into several comma-separated texts. The reason for this is that Entrez will not accept too many queries in one chunk.

The downloaded sequences are saved in out.file on your system. This will be a FASTA formatted file. Note that all downloaded sequences end up in this file. If you want to download multiple genomes, you call entrezDownload multiple times and store in multiple files.

See Also

getAccessions, readFasta.

Examples

Run this code
# NOT RUN {
# Accession numbers for the chromosome and plasmid of Buchnera aphidicola, strain APS
acc <- "BA000003.2,AP001071.1"
genome.file <- tempfile(pattern = "Buchnera_aphidicola", fileext = ".fna")
txt <- entrezDownload(acc, out.file = genome.file)

# ...cleaning...
ok <- file.remove(genome.file)
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab