Learn R Programming

seqinr (version 3.1-2)

recstat: Prediction of Coding DNA Sequences.

Description

This function aims at predicting the position of Coding DNA Sequences (CDS) through the use of a Correspondence Analysis (CA) computed on codon composition, this for the three reading frames of a DNA strand.

Usage

recstat(seq, sizewin = 90, shift = 30, seqname = "no name")

Arguments

seq
a nucleic acid sequence as a vector of characters
sizewin
an integer, multiple of 3, giving the length of the sliding window
shift
an integer, multiple of 3, giving the length of the steps between two windows
seqname
the name of the sequence

Value

  • This function returns a list containing the following components:
  • seqa single DNA sequence as a vector of characters
  • sizewinlength of the sliding window
  • shiftlength of the steps between windows
  • seqsizelength of the sequence
  • seqnamename of the sequence
  • vdepa vector containing the positions of windows starts
  • vinda vector containing the reading frame of each window
  • vstopda vector of stop codons positions in direct strand
  • vstopra vector of stop codons positions in reverse strand
  • vinitda vector of start codons positions in direct strand
  • vinitra vector of start codons positions in reverse strand
  • resda matrix containing codons frequencies for all the windows in the three frames of the direct strand
  • resra matrix containing codons frequencies for all the windows in the three frames of the reverse strand
  • resd.coalist of class coa and dudi containing the result of the CA computed on the codons frequencies in the direct strand
  • resr.coalist of class coa and dudi containing the result of the CA computed on the codons frequencies in the reverse strand

Details

The method is built on the hypothesis that the codon composition of a CDS is biased while it is not the case outside these regions. In order to detect such bias, a CA on codon frequencies is computed on the six possible reading frames of a DNA sequence (three from the direct strand and three from the reverse strand). When there is a CDS in one of the reading frame, it is expected that the CA factor scores observed in this frame (fot both rows and columns) will be significantly different from those in the two others.

References

The original paper describing recstat is: Fichant, G., Gautier, C. (1987) Statistical method for predicting protein coding regions in nucleic acid sequences. Comput. Appl. Biosci., 3, 287--295. http://bioinformatics.oxfordjournals.org/content/3/4/287.abstract

See Also

draw.recstat, test.li.recstat, test.co.recstat

Examples

Run this code
library(seqinr)
library(ade4)
ff <- system.file("sequences/ECOUNC.fsa", package = "seqinr2")
seq <- read.fasta(ff)
rec <- recstat(seq[[1]], seqname = getName(seq))

Run the code above in your browser using DataLab