Learn R Programming

MKmisc (version 1.9)

stringDist: Function to compute distances between strings

Description

The function can be used to compute distances between strings.

Usage

stringDist(x, y, method = "levenshtein", mismatch = 1, gap = 1)

Value

stringDist returns an object of S3 class "stringDist" inherited from class "dist"; cf. dist.

Arguments

x

character vector, first string

y

character vector, second string

method

character, name of the distance method. This must be "levenshtein" or "hamming". Default is the classical Levenshtein distance.

mismatch

numeric, distance value for a mismatch between symbols

gap

numeric, distance value for inserting a gap

Author

Matthias Kohl Matthias.Kohl@stamats.de

Details

The function computes the Hamming and the Levenshtein (edit) distance of two given strings (sequences).

In case of the Hamming distance the two strings must have the same length.

In case of the Levenshtein (edit) distance a scoring and a trace-back matrix are computed and are saved as attributes "ScoringMatrix" and "TraceBackMatrix". The characters in the trace-back matrix reflect insertion of a gap in string y (d: deletion), match (m), mismatch (mm), and insertion of a gap in string x (i).

References

R. Merkl and S. Waack (2009). Bioinformatik Interaktiv. Wiley.

See Also

dist, stringSim

Examples

Run this code
x <- "GACGGATTATG"
y <- "GATCGGAATAG"
## Levenshtein distance
d <- stringDist(x, y)
d
attr(d, "ScoringMatrix")
attr(d, "TraceBackMatrix")

## Hamming distance
stringDist(x, y)

Run the code above in your browser using DataLab