efetch: efetch - downloading full records

Description

efetch performs calls to the NCBI EFetch utility to retrieve data records in the requested format for an NCBI Accession Number, one or more primary UIDs, or for a set of UIDs stored in the user's web environment.

Usage

efetch(uid, db = NULL, rettype = NULL, retmode = NULL, outfile = NULL, retstart = NULL, retmax = NULL, querykey = NULL, webenv = NULL, strand = NULL, seqstart = NULL, seqstop = NULL, complexity = NULL)

Arguments

uid

(Required) A list of UIDs provided either as a character vector, as an esearch object, or by reference to a Web Environment and a query key obtained directly from previous calls to esearch (if usehistory = TRUE), epost or elink. If UIDs are provided as a plain character vector, db must be specified explicitly, and all of the UIDs must be from the database specified by db.

(Required if uid is a character vector of UIDs) Database from which to retrieve records. See here for the supported databases.

rettype

A character string specifying the retrieval type, such as 'abstract' or 'medline' for PubMed, 'gp' or 'fasta' for Protein, or 'gb', or 'fasta' for Nuccore. See here for the available values for each database.

retmode

A character string specifying the data mode of the records returned, such as 'text' or 'xml'. See here for the available values for each database.

outfile

A character string naming a file for writing the data to. Required if more than 500 UIDs are retrieved at once. In this case UIDs have to be provided by reference to a Web Environment and a query key obtained directly from previous calls to esearch (if usehistory = TRUE), epost or elink.

retstart

Numeric index of the first record to be retrieved.

retmax

Total number of records from the input set to be retrieved.

querykey

An integer specifying which of the UID lists attached to a user's Web Environment will be used as input to efetch. (Usually obtained drectely from objects returned by a previous call to esearch, epost or elink.)

webenv

A character string specifying the Web Environment that contains the UID list. (Usually obtained directely from objects returned by a previous call to esearch, epost or elink.)

strand

Strand of DNA to retrieve. (1: plus strand, 2: minus strand)

seqstart

First sequence base to retrieve.

seqstop

Last sequence base to retrieve.

complexity

Data content to return. (0: entire data structure, 1: bioseq, 2: minimal bioseq-set, 3: minimal nuc-prot, 4: minimal pub-set)

Value

An efetch object.

Details

See the official online documentation for NCBI's EUtilities for additional information.

See here for the default values for rettype and retmode, as well as a list of the available databases for the EFetch utility.

Examples

Run this code

## Not run: 
# ## From Protein, retrieve a raw GenPept record and write it to a file.
# p <- efetch("195055", "protein", "gp")
# p
# 
# write(content(p, "text"), file = "~/AAD15290.gp")
# 
# ## Get accessions for a list of GenBank IDs (GIs)
# acc <- efetch(c("1621261", "89318838", "68536103", "20807972", "730439"),
#               "protein", rettype = "acc")
# acc
# acc <- strsplit(content(acc), "\n")[[1]]
# acc
# 
# ## Get GIs from a list of accession numbers
# gi <- efetch(c("CAB02640.1", "EAS10332.1", "YP_250808.1", "NP_623143.1", "P41007.1"),
#              "protein", "uilist")
# gi
# 
# ## we can conveniently extract the UIDs using the eutil method #xmlValue(xpath)
# gi$xmlValue("/IdList/Id")
# 
# ## or we can extract the contents of the efetch query using the fuction content()
# ## and use the XML package to retrieve the UIDs
# doc <- content(gi)
# XML::xpathSApply(doc, "/IdList/Id", XML::xmlValue)
# 
# ## Get the scientific name for an organism starting with the NCBI taxon id.
# tx <- efetch("527031", "taxonomy")
# tx
# 
# ## Convenience accessor for XML nodes of interest using XPath
# ## Extract the TaxIds of the Lineage
# tx["//LineageEx/Taxon/TaxId"]
# 
# ## Use an XPath expession to extract the scientific name.
# tx$xmlValue("/TaxaSet/Taxon/ScientificName")
# 
# ## Iteratively retrieve a large number of records
# # First store approx. 8400 UIDs on the History server.
# uid <- esearch(term = "hexokinase", db = 'protein', usehistory = TRUE)
# # Fetch the records and write to file in batches of 500.
# efetch(uid, rettype = "fasta", retmode = "text", outfile = "~/tmp/hexokinases.fna")
# 
# ## End(Not run)

Run the code above in your browser using DataLab