featureCTD: Feature Coding by composition, transition and distribution

Description

Sequences are coded based on their composition, transition and distribution.

Usage

featureCTD(seq,class=elements("aminoacid"))

Arguments

seq

a string vector for the protein, DNA, or RNA sequences.

class

a list for the class of biological properties. It can be produced by elements and aaClass.

Details

featureCTD returns a matrix with M+M*(M-1)/2+M*5 columns. Each row represented features of one sequence coding by a M+M*(M-1)/2+M*5 dimension numeric vector. Three kinds of coding: composition (C), transition (T) and distribution (D) are used. C is the number of amino acids of a particular property (such as hydrophobicity) divided by the total number of amino acids. T characterizes the percent frequency with which amino acids of a particular property is followed by amino acids of a different property. D measures the chain length within which the first, 25, 50, 75 and 100 acids of a particular property is located respectively.

Examples

Run this code

if(interactive()){  
  file = file.path(path.package("BioSeqClass"), "example", "acetylation_K.fasta")  
  library(Biostrings)
  tmp = readAAStringSet(file)

  proteinSeq = as.character(tmp)
  
  CTD1 = featureCTD(proteinSeq, class=elements("aminoacid") )
  CTD2 = featureCTD(proteinSeq, class=aaClass("aaV") )
}

Run the code above in your browser using DataLab