Learn R Programming

tcR (version 2.2.4)

find.similar.sequences: Find similar sequences.

Description

Return matrix M with two columns. For each element in row i and column j M[i,j] => distance between pattern(i) and data(j) sequences equal to or less than .max.errors. This function will uppercase .data and remove all strings, which have anything than A-Z letters.

Usage

find.similar.sequences(.data, .patterns = c(), .method = c('exact', 'hamm', 'lev'),
                       .max.errors = 1, .verbose = T, .clear = F)

exact.match(.data, .patterns = c(), .verbose = T)

hamming.match(.data, .patterns = c(), .max.errors = 1, .verbose = T)

levenshtein.match(.data, .patterns = c(), .max.errors = 1, .verbose = T)

Arguments

.data

Vector of strings.

.patterns

Character vector of sequences, which will be used for searching for neighbours.

.method

Which method use: 'exact' for exact matching, 'hamm' for Hamming Distance, 'lev' for Levenshtein distance.

.max.errors

Max Hamming or Levenshtein distance between strings. Doesn't use in 'exact' setting.

.verbose

Should function print progress or not. // DON'T USE IT

.clear

if T then remove all sequences with character "*" or "~".

Value

Matrix with two columns [i,j], dist(data(i), data(j)) <= .max.errors.