Learn R Programming

ape (version 5.1)

read.GenBank: Read DNA Sequences from GenBank via Internet

Description

This function connects to the GenBank database, and reads nucleotide sequences using accession numbers given as arguments.

Usage

read.GenBank(access.nb, seq.names = access.nb, species.names = TRUE,
             gene.names = FALSE, as.character = FALSE)

Arguments

access.nb

a vector of mode character giving the accession numbers.

seq.names

the names to give to each sequence; by default the accession numbers are used.

species.names

a logical indicating whether to attribute the species names to the returned object.

gene.names

obsolete (will be removed soon).

as.character

a logical controlling whether to return the sequences as an object of class "DNAbin" (the default).

Value

A list of DNA sequences made of vectors of class "DNAbin", or of single characters (if as.character = TRUE) with two attributes (species and description).

Details

The function uses the site http://www.ncbi.nlm.nih.gov/ from where the sequences are retrieved.

If species.names = TRUE, the returned list has an attribute "species" containing the names of the species taken from the field ``ORGANISM'' in GenBank.

Since ape 3.6, this function retrieves the sequences in FASTA format: this is more efficient and more flexible (scaffolds and contigs can be read). The option gene.names is obsolete and will be removed; this information is also present in the description.

Setting species.names = FALSE is quite faster (could be useful if you read a series of scaffolds or contigs, or if you already have the species names).

See Also

read.dna, write.dna, dist.dna, DNAbin

Examples

Run this code
# NOT RUN {
## This won't work if your computer is not connected
## to the Internet

## Get the 8 sequences of tanagers (Ramphocelus)
## as used in Paradis (1997)
ref <- c("U15717", "U15718", "U15719", "U15720",
         "U15721", "U15722", "U15723", "U15724")
## Copy/paste or type the following commands if you
## want to try them.
# }
# NOT RUN {
Rampho <- read.GenBank(ref)
## get the species names:
attr(Rampho, "species")
## build a matrix with the species names and the accession numbers:
cbind(attr(Rampho, "species"), names(Rampho))
## print the first sequence
## (can be done with `Rampho$U15717' as well)
Rampho[[1]]
## the description from each FASTA sequence:
attr(Rampho, "description")
# }

Run the code above in your browser using DataLab