Learn R Programming

micropan (version 2.1)

readBlastSelf: Reads BLAST result files

Description

Reads files from a search with blastpAllAll

Usage

readBlastSelf(blast.files, e.value = 1, verbose = TRUE)

Arguments

blast.files

A text vector of filenames.

e.value

A threshold E-value to immediately discard (very) poor BLAST alignments.

verbose

Logical, indicating if textual output should be given to monitor the progress.

Value

The functions returns a table with columns Dbase, Query, Bitscore and Distance. Each row corresponds to a pair of sequences (a Dbase and a Query sequence) having at least one BLAST hit between them. All pairs not listed have distance 1.0 between them. You should normally bind the output from readBlastSelf to the ouptut from readBlastPair and use the result as input to bDist.

Details

The filenames given as input must refer to BLAST result files produced by blastpAllAll.

With readBlastSelf you only read the self-alignment results, i.e. blasting a genome against itself. With readBlastPair you read all the other files, i.e. different genomes compared. You may use all blast file names as input to both, they will select the proper files based on their names, e.g. GID1_vs_GID1.txt is read by readBlastSelf while GID2_vs_GID1.txt is read by readBlastPair.

Setting a small e.value threshold will filter the alignment, and may speed up this and later processing, but you may also loose some important alignments for short sequences.

Both these functions are used by bDist. The reason we provide them separately is to allow the user to complete this file reading before calling bDist. If you have a huge number of files, a skilled user may utilize parallell processing to speed up the reading. For normal size data sets (e.g. less than 100 genomes) you should probably use bDist directly.

See Also

bDist, blastpAllAll.

Examples

Run this code
# NOT RUN {
# Using BLAST result files in this package...
prefix <- c("GID1_vs_GID1_",
            "GID2_vs_GID1_",
            "GID3_vs_GID1_",
            "GID2_vs_GID2_",
            "GID3_vs_GID2_",
            "GID3_vs_GID3_")
bf <- file.path(path.package("micropan"), "extdata", str_c(prefix, ".txt.xz"))

# We need to uncompress them first...
blast.files <- tempfile(pattern = prefix, fileext = ".txt.xz")
ok <- file.copy(from = bf, to = blast.files)
blast.files <- unlist(lapply(blast.files, xzuncompress))

# Reading self-alignment files, then the other files
self.tbl <- readBlastSelf(blast.files)
pair.tbl <- readBlastPair(blast.files)

# ...and cleaning...
ok <- file.remove(blast.files)

# See also examples for bDist

# }

Run the code above in your browser using DataLab