blast.pdb(seq, database = "pdb", time.out = NULL, chain.single=TRUE)
get.blast(urlget, time.out = NULL, chain.single=TRUE)
"plot"(x, cutoff = NULL, cut.seed=NULL, cluster=TRUE, mar=c(2, 5, 1, 1), cex=1.5, ...)
get.seq
can be provided. blast.pdb
. blast.pdb
returns a list with the first eight components
below. The function plot.blast
produces a plot on the active graphics
device and returns a three component list object with hits
, pdb.id
and gi.id
see below:
blast.pdb
function employs direct HTTP-encoded requests to the NCBI web
server to run BLASTP, the protein search algorithm of the BLAST
software package.BLAST, currently the most popular pairwise sequence comparison algorithm for database searching, performs gapped local alignments via a heuristic strategy: it identifies short nearly exact matches or hits, bidirectionally extends non-overlapping hits resulting in ungapped extended hits or high-scoring segment pairs(HSPs), and finally extends the highest scoring HSP in both directions via a gapped alignment (Altschul et al., 1997)
For each pairwise alignment BLAST reports the raw score, bitscore and an E-value that assess the statistical significance of the raw score. Note that unlike the raw score E-values are normalized with respect to both the substitution matrix and the query and database lengths.
Here we also return a corrected normalized score (mlog.evalue) that in our experience is easier to handle and store than conventional E-values. In practice, this score is equivalent to minus the natural log of the E-value. Note that, unlike the raw score, this score is independent of the substitution matrix and and the query and database lengths, and thus is comparable between BLASTP searches.
Examining plots of BLAST alignment lengths, scores, E-values and normalized
scores (-log(E-Value) from the blast.pdb
function can aid in the
identification sensible hit similarity thresholds. This is facilitated by
the plot.blast
function.
If a cutoff value is not supplied then a basic hierarchical clustering of normalized scores is performed with initial group partitioning implemented at a hopefully sensible point in the vicinity of h=cut.seed. Inspection of the resultant plot can then be use to refine the value of cut.seed or indeed cutoff. As the cutoff value can vary depending on the desired application and indeed the properties of the system under study it is envisaged that plot.blast will be called multiple times to aid selection of a suitable cutoff value. See the examples below for further details.
BLAST is the work of Altschul et al.: Altschul, S.F. et al. (1990) J. Mol. Biol. 215, 403--410. Full details of the BLAST algorithm, along with download and installation instructions can be obtained from: http://www.ncbi.nlm.nih.gov/BLAST/.
plot.blast
, hmmer
, seqaln
, get.pdb
## Not run:
# pdb <- read.pdb("4q21")
# blast <- blast.pdb( pdbseq(pdb) )
#
# head(blast$hit.tbl)
# top.hits <- plot(blast)
# head(top.hits$hits)
#
# ## Use 'get.blast()' to retrieve results at a later time.
# #x <- get.blast(blast$url)
# #head(x$hit.tbl)
#
# # Examine and download 'best' hits
# top.hits <- plot.blast(blast, cutoff=188)
# head(top.hits$hits)
# #get.pdb(top.hits)
# ## End(Not run)
Run the code above in your browser using DataLab