Learn R Programming

specL (version 1.6.2)

annotate.protein_id: Annotate protein_id

Description

This function assigns the protein identifier for a list of tandem mass specs having a peptide sequence assigned.

Usage

annotate.protein_id(data, file = NULL, fasta = read.fasta(file = file, 
         as.string = TRUE, seqtype = "AA"), digestPattern = "(([RK])|(^)|(^M))")

Arguments

data
list of records containing mZ and peptide sequences.
file
file name of a FASTA file.
fasta
a fasta object as returned by the seqinr::read.fasta(...) method.
digestPattern
a regex pattern which can be used by the grep command. the default regex pattern assumes a tryptic digest.

Value

  • it returns a list object.

Details

The protein sequences a read by the read.fasta function of the seqinr package. The protein identifier is written to the protein proteinInformation variable.

If the function is called on a multi-core architecture it uses mclapply.

It is recommended to load the FASTA file prior to running annotate.protein_id using

myFASTA <- read.fasta(file = file, as.string = TRUE, seqtype = "AA")

instead of providing the FASTA file name to the function.

See Also

?read.fasta of the seqinr package. http://www.uniprot.org/help/fasta-headers

Examples

Run this code
# annotate.protein_id
    
    # our Fasta sequence
      irtFASTAseq <- paste(">zz|ZZ_FGCZCont0260|", 
      "iRT_Protein_with_AAAAK_spacers concatenated Biognosys
",
      "LGGNEQVTRAAAAKGAGSSEPVTGLDAKAAAAKVEATFGVDESNAKAAAAKYILAGVENS",
      "KAAAAKTPVISGGPYEYRAAAAKTPVITGAPYEYRAAAAKDGLDAASYYAPVRAAAAKAD",
      "VTPADFSEWSKAAAAKGTFIIDPGGVIRAAAAKGTFIIDPAAVIRAAAAKLFLQFGAQGS",
      "PFLK
")
      
    # be realistic, do it from file
      Tfile <- file();  cat(irtFASTAseq, file = Tfile);
      
    #use read.fasta from seqinr
      fasta.irtFASTAseq <-read.fasta(Tfile, as.string=TRUE, seqtype="AA")
      close(Tfile)
    
    #annotate with proteinID 
    # -> here we find all psms from the one proteinID above
      peptideStd <- specL::annotate.protein_id(peptideStd, 
      fasta=fasta.irtFASTAseq)
  
    #show indices for all PSMs where we have a proteinInformation
     which(unlist(lapply(peptideStd, 
      function(x){nchar(x$proteinInformation)>0})))

Run the code above in your browser using DataLab