annotate.protein_id: Annotate protein_id

Description

This function assigns the protein identifier for a list of tandem mass specs having a peptide sequence assigned.

Usage

annotate.protein_id(data, file = NULL, fasta = read.fasta(file = file, 
         as.string = TRUE, seqtype = "AA"), digestPattern = "(([RK])|(^)|(^M))")

Arguments

data

list of records containing mZ and peptide sequences.

file

file name of a FASTA file.

fasta

a fasta object as returned by the seqinr::read.fasta(...) method.

digestPattern

a regex pattern which can be used by the grep command. the default regex pattern assumes a tryptic digest.

Value

it returns a list object.

Details

The protein sequences a read by the read.fasta function of the seqinr package. The protein identifier is written to the protein proteinInformation variable.

If the function is called on a multi-core architecture it uses mclapply.

It is recommended to load the FASTA file prior to running annotate.protein_id using

myFASTA <- read.fasta(file = file, as.string = TRUE, seqtype = "AA")

instead of providing the FASTA file name to the function.

Examples

Run this code

# annotate.protein_id
    
    # our Fasta sequence
      irtFASTAseq <- paste(">zz|ZZ_FGCZCont0260|", 
      "iRT_Protein_with_AAAAK_spacers concatenated Biognosys
",
      "LGGNEQVTRAAAAKGAGSSEPVTGLDAKAAAAKVEATFGVDESNAKAAAAKYILAGVENS",
      "KAAAAKTPVISGGPYEYRAAAAKTPVITGAPYEYRAAAAKDGLDAASYYAPVRAAAAKAD",
      "VTPADFSEWSKAAAAKGTFIIDPGGVIRAAAAKGTFIIDPAAVIRAAAAKLFLQFGAQGS",
      "PFLK
")
      
    # be realistic, do it from file
      Tfile <- file();  cat(irtFASTAseq, file = Tfile);
      
    #use read.fasta from seqinr
      fasta.irtFASTAseq <-read.fasta(Tfile, as.string=TRUE, seqtype="AA")
      close(Tfile)
    
    #annotate with proteinID 
    # -> here we find all psms from the one proteinID above
      peptideStd <- specL::annotate.protein_id(peptideStd, 
      fasta=fasta.irtFASTAseq)
  
    #show indices for all PSMs where we have a proteinInformation
     which(unlist(lapply(peptideStd, 
      function(x){nchar(x$proteinInformation)>0})))