The indel check function analyzes the framed and translated DNA sequences in two ways in order to
allow users to make an informed decision about whether or not a DNA sequence contains a frameshift error.
This test is designed to detect insertion or deletion errors resulting from technical errors in DNA sequencing,
but can in some instances identify biological contaminants (i.e. if the contaminant sequence uses a different
genetic code than the target, or if the contaminants are things such as pseudogenes that possess sequences that
are highly divergent from animal COI-5P sequences).
The two tests performed are: (1) a query for stop codons in the amino acid sequence and (2) an evaluation of the
log likelihood value resulting from the comparison of the framed coi5p amino acid sequence against the COI-5P
amino acid PHMM. The default likelihood value for identifying a sequence is likely erroneous is -358.88. Sequences with
likelihood values lower than this will receive an indel_likely value of TRUE. The threshold of -358.88 was experimentally
determined to be the optimal likelihood threshold for separating of full-length sequences with and without errors when
the censored translation option is used. Sequences will have higher likelihood values when a specific genetic code is used.
Sequences will have lower likelihood values when they are not complete barcode sequences (i.e. <500bp in length). For these
reasons the likelihood threshold is not a specific value but a parameter that can be altered based on the type of translation
and length of the sequences. Below are experimentally determined suggested values for different size and translation table
combinations.
Short barcode sequences, known genetic code: indel_threshold = -354.44
Short barcode sequences, unknown genetic code: indel_threshold = -440.24
Full length barcode sequences, known genetic code: indel_threshold = -246.20
Full length barcode sequences, unknown genetic code: indel_threshold = -358.88
Source: Nugent et al. 2019 (doi: https://doi.org/10.1101/2019.12.12.865014).