plot.blast: Plot a Summary of BLAST Hit Statistics.

Description

Produces a number of basic plots that should facilitate hit selection from the match statistics of a BLAST result.

Usage

## S3 method for class 'blast':
plot(x, cutoff = NULL, cut.seed=NULL, cluster=TRUE, mar=c(2, 5, 1, 1), cex=1.5, ...)

Arguments

BLAST results as obtained from the function blast.pdb.

cutoff

A numeric cutoff value, in terms of minus the log of the evalue, for returned hits. If null then the function will try to find a suitable cutoff near cut.seed which can be used as an initial guide (see below).

cut.seed

A numeric seed cutoff value, used for initial cutoff estimation. If null then a seed position is set to the point of largest drop-off in normalized scores (i.e. the biggest jump in E-values).

cluster

Logical, if TRUE (and cutoff is null) a clustering of normalized scores is performed to partition hits in groups by similarity to query. If FALSE the partition point is set to the point of largest drop-off in normalized scores.

mar

A numerical vector of the form c(bottom, left, top, right) which gives the number of lines of margin to be specified on the four sides of the plot.

cex

a numerical single element vector giving the amount by which plot labels should be magnified relative to the default.

...

extra plotting arguments.

Value

Produces a plot on the active graphics device and returns a three component list object:
hitsan ordered matrix detailing the subset of hits with a normalized score above the chosen cutoff. Database identifiers are listed along with their cluster group number.
pdb.ida character vector containing the PDB database identifier of each hit above the chosen threshold.
gi.ida character vector containing the gi database identifier of each hit above the chosen threshold.

Details

Examining plots of BLAST alignment lengths, scores, E-values and normalized scores (-log(E-Value), see blast.pdb function) can aid in the identification sensible hit similarity thresholds.

If a cutoff value is not supplied then a basic hierarchical clustering of normalized scores is performed with initial group partitioning implemented at a hopefully sensible point in the vicinity of h=cut.seed. Inspection of the resultant plot can then be use to refine the value of cut.seed or indeed cutoff. As the cutoff value can vary depending on the desired application and indeed the properties of the system under study it is envisaged that plot.blast will be called multiple times to aid selection of a suitable cutoff value. See the examples below for further details.

References

Grant, B.J. et al. (2006) Bioinformatics 22, 2695--2696.

Examples

Run this code

b2 <-  blast.pdb( pdbseq(read.pdb( get.pdb("4q21", URLonly=TRUE) )) )
raw.hits <- plot.blast(b2)
top.hits <- plot.blast(b2, 188)
head(top.hits$hits)

blast <- blast.pdb( pdbseq(read.pdb( get.pdb("2BN3", URLonly=TRUE) )))
raw.hits <- plot(blast)
top.hits <- plot(blast, cut.seed=20)

head(top.hits$pdb.id)
#pdbFiles <- get.pdb(substr(top.hits$pdb.id, 1, 4), path="downloadedPDBs")
#pdbsplit(pdbFiles, path="downloadedPDBs/PDB_chains")

Run the code above in your browser using DataLab