Learn R Programming

bio3d (version 2.1-3)

blast.pdb: NCBI BLAST Sequence Search

Description

Run NCBI blastp, on a given sequence, against the PDB, NR and swissprot sequence databases.

Usage

blast.pdb(seq, database = "pdb", time.out = NULL, chain.single=TRUE)
get.blast(urlget, time.out = NULL, chain.single=TRUE)

Arguments

seq
a single element or multi-element character vector containing the query sequence. Alternatively a fasta object from function get.seq can be provided.
database
a single element character vector specifying the database against which to search. Current options are pdb, nr and swissprot.
time.out
integer specifying the number of seconds to wait for the blast reply before a time out occurs.
urlget
the URL to retrieve BLAST results; Usually it is returned by blast.pdb if time.out is set and met.
chain.single
logical, if TRUE double NCBI character PDB database chain identifiers are simplified to lowercase '1WF4_GG' > '1WF4_g'. If FALSE no conversion to match RCSB PDB files is performed.

Value

  • A list with eight components:
  • bitscorea numeric vector containing the raw score for each alignment.
  • evaluea numeric vector containing the E-value of the raw score for each alignment.
  • mlog.evaluea numeric vector containing minus the natural log of the E-value.
  • gi.ida character vector containing the gi database identifier of each hit.
  • pdb.ida character vector containing the PDB database identifier of each hit.
  • hit.tbla character matrix summarizing BLAST results for each reported hit, see below.
  • rawa data frame summarizing BLAST results, note multiple hits may appear in the same row.
  • urla single element character vector with the NCBI result URL and RID code. This can be passed to the get.blast function.

Details

This function employs direct HTTP-encoded requests to the NCBI web server to run BLASTP, the protein search algorithm of the BLAST software package.

BLAST, currently the fastest and most popular pairwise sequence comparison algorithm, performs gapped local alignments, through the implementation of a heuristic strategy: it identifies short nearly exact matches or hits, bidirectionally extends non-overlapping hits resulting in ungapped extended hits or high-scoring segment pairs (HSPs), and finally extends the highest scoring HSP in both directions via a gapped alignment (Altschul et al., 1997)

For each pairwise alignment BLAST reports the raw score, bitscore and an E-value that assess the statistical significance of the raw score. Note that unlike the raw score E-values are normalized with respect to both the substitution matrix and the query and database lengths.

Here we also return a corrected normalized score (mlog.evalue) that in our experience is easier to handle and store than conventional E-values. In practice, this score is equivalent to minus the natural log of the E-value. Note that, unlike the raw score, this score is independent of the substitution matrix and and the query and database lengths, and thus is comparable between BLASTP searches.

References

Grant, B.J. et al. (2006) Bioinformatics 22, 2695--2696.

BLAST is the work of Altschul et al.: Altschul, S.F. et al. (1990) J. Mol. Biol. 215, 403--410. Full details of the BLAST algorithm, along with download and installation instructions can be obtained from: http://www.ncbi.nlm.nih.gov/BLAST/.

See Also

plot.blast, hmmer, seqaln

Examples

Run this code
pdb <- read.pdb("1bg2")
blast <- blast.pdb( pdbseq(pdb) )

head(blast$hit.tbl)
top.hits <- plot(blast)
head(top.hits$hits)

## Use 'get.blast()' to retrieve results at a later time.
x <- get.blast(blast$url)
head(x$hit.tbl)

Run the code above in your browser using DataLab