Function sdists.trace
complements the distance computation between
sequences by sdists
. So, please, see the details of
method
, weight
, and exclude
there. However, note the
following differences: 1) you can supply only two sequences, either as
vectors of numeric symbol codes, factors, or as strings, i.e. scalar
vectors of type character
. 2) you can supply a weight matrix with
the rownames and colnames representing the symbol sets of the first and
second sequence. For instance, this allows you to align a sequence with
the profile of a multiple alignment. 3) if method = "ow"
the
space symbol ""
is included in the factor levels so that you can
conveniently replace NA
in the aligned sequences.
A transcript uses the character codes I
, D
, R
, and
M
, for insert, delete, replace, and match operations, which
transform the first into the second sequence. Thus, conceptually a symbol
has to be inserted into the first, deleted from the second, replaced in the
first sequence, or matched in both, to obtain the second sequence. However,
in the aligned sequences you will see NA
, where an insert or delete
would take place, indicating space.
In the case of a local alignment different symbols are used for the
prefix and/or suffix of the alignment: i
, d
, and ?
for insert, delete, and replace or match operations. However, note that
their sole purpose is to obtain a common representation of the two
sequences. Finally, only alignments of maximal length are reported.
The time complexity of finding a transcript is \(O(n+m)\) for two
sequences of length n and m, respectively \(O(n*m)\) for the local
alignment problem. However, note that the runtime for generating all
transcripts can be \(O((n*m)^3)\) in the worst case.
If partial = FALSE
computes an approximate substring match of
x
(the pattern) in y
, for method = "ow"
only.
Returns the subset of paths which require the maximum number of match
and initial and final insert operations.