Learn R Programming

BioSeqClass (version 1.30.0)

featurePSSM: Feature Coding

Description

A set of functions for extract features from biological sequences, and coding features by numeric vector.

Usage

featurePSSM(seq, start.pos, stop.pos, psiblast.path, database.path)

Arguments

seq
a string vector for the protein, DNA, or RNA sequences.
start.pos
a integer vector denoting the start position of the fragment window.
stop.pos
a integer vector denoting the stop position of the fragment window.
psiblast.path
a string for the path of blastpgp program. blastpgp will be employed to do PSI-BLAST and get Position-Specific Scoring Matrix.
database.path
a string for the path of a formated reference database. Database can be formated by "formatdb" program.

Details

featurePSSM returns a matrix with 20*N+N columns. Each row represented features of one sequence coding by a 20*N+N dimension numeric vector generated by PSI-BLAST. It contains two kinds of fatures: normalized position-specific score of PSSM (Position-Specific Scoring Matrix), Shannon entropy for each position of WOP (weighted observed percentages). Program PSI-BLAST and formatted NCBI non-redundant protein database are needed.

Examples

Run this code
if(interactive()){
  file = file.path(path.package("BioSeqClass"), "example", "acetylation_K.fasta")  
  tmp = readAAStringSet(file) 
  proteinSeq = as.character(tmp)
   
  ## Need "blastpgp" program and a formated database. Database can be formated by "formatdb" program.
  PSSM1 = featurePSSM(proteinSeq[1:2], start.pos=rep(1,2), stop.pos=rep(10,2), psiblast.path="blastpgp", database.path="./result1.fasta")  
}

Run the code above in your browser using DataLab