Learn R Programming

micropan (version 1.0)

prodigalPredict: Gene predictions using Prodigal

Description

Finds coding genes in a genome, using the Prodigal software, and outputs them as a FASTA file.

Usage

prodigalPredict(genome.file,prot.file,nuc.file=NULL,closed.ends=TRUE,motif.scan=FALSE)

Arguments

genome.file
Name of a FASTA formatted file with all the DNA sequences for a genome (chromosomes, plasmids, contigs etc.).
prot.file
Name of output file. Predicted protein sequences will be written to this file, in a FASTA format.
nuc.file
If specified, nucleotide version of each protein is written to this file (default NULL).
closed.ends
Logical, if TRUE genes are not allowed to run off edges (default TRUE).
motif.scan
Logical, if TRUE forces motif scan instead of Shine-Dalgarno trainer (default FALSE).

Value

The call to Prodigal produces a FASTA formatted file with predicted protein sequences, and if nuc.file is specified, a similar file with nucleotide sequences. See readFasta for how to read such files into R.

Details

This function sets up a call to the software Prodigal (Hyatt et al, 2009). This software is designed to find coding genes in prokaryote genomes. It runs fast and has obtained very good results in tests among the automated gene finders. The options used as default here are believed to be the best for pan-genomic analyses.

References

Hyatt, D., Chen, G., LoCascio, P.F., Land, M.L., Larimer, F.W., Hauser, L.J. (2009). Prodigal: prokaryotic gene recognition and translation initiation site identification, BMC Bioinformatics, 11:119.

See Also

entrezDownload, readFasta.

Examples

Run this code
## Not run: 
# # Using a small genome file in this package
# # We need to uncompress it first...
# extdata.path <- file.path(path.package("micropan"),"extdata")
# filenames <- "Mpneumoniae_309_genome.fsa"
# pth <- lapply( file.path( extdata.path, paste( filenames, ".xz", sep="" ) ), xzuncompress )
# 
# # Calling Prodigal, and using a similar name (_genome replaced by _protein) in output
# prodigalPredict( file.path(extdata.path,filenames), gsub("_genome","_protein",filenames) )
# 
# # ...and compressing the genome-file again...
# pth <- lapply( file.path( extdata.path, filenames ), xzcompress )
# ## End(Not run)

Run the code above in your browser using DataLab