Learn R Programming

Biostrings (version 2.40.2)

pid: Percent Sequence Identity

Description

Calculates the percent sequence identity for a pairwise sequence alignment.

Usage

pid(x, type="PID1")

Arguments

type
one of percent sequence identity. One of "PID1", "PID2", "PID3", and "PID4". See Details for more information.

Value

A numeric vector containing the specified sequence identity measures.

Details

Since there is no universal definition of percent sequence identity, the pid function calculates this statistic in the following types:
"PID1":
100 * (identical positions) / (aligned positions + internal gap positions)

"PID2":
100 * (identical positions) / (aligned positions)

"PID3":
100 * (identical positions) / (length shorter sequence)

"PID4":
100 * (identical positions) / (average length of the two sequences)

References

A. May, Percent Sequence Identity: The Need to Be Explicit, Structure 2004, 12(5):737.

G. Raghava and G. Barton, Quantification of the variation in percentage identity for protein sequence alignments, BMC Bioinformatics 2006, 7:415.

See Also

pairwiseAlignment, PairwiseAlignments-class, match-utils

Examples

Run this code
  s1 <- DNAString("AGTATAGATGATAGAT")
  s2 <- DNAString("AGTAGATAGATGGATGATAGATA")

  palign1 <- pairwiseAlignment(s1, s2)
  palign1
  pid(palign1)

  palign2 <-
    pairwiseAlignment(s1, s2,
      substitutionMatrix =
      nucleotideSubstitutionMatrix(match = 2, mismatch = 10, baseOnly = TRUE))
  palign2
  pid(palign2, type = "PID4")

Run the code above in your browser using DataLab