Learn R Programming

sjmisc (version 1.0.2)

str_pos: Find partial matching and close distance elements in strings

Description

This function finds the element indices of partial matching or similar strings in a character vector. Can be used to find exact or slightly mistyped elements in a string vector.

Usage

str_pos(searchString, findTerm, maxdist = 2, part.dist.match = 0,
  showProgressBar = FALSE)

Arguments

searchString
a character vector with string elements
findTerm
the string that should be matched against the elements of searchString.
maxdist
the maximum distance between two string elements, which is allowed to treat them as similar or equal.
part.dist.match
activates similar matching (close distance strings) for parts (substrings) of the searchString. Following values are accepted:
  • 0 for no partial distance matching
  • 1 for one-step matching, which means, only substrings of same le
showProgressBar
If TRUE, the progress bar is displayed when computing the distance matrix. Default in FALSE, hence the bar is hidden.

Value

  • A numeric vector with index position of elements in searchString that partially match or are similar to findTerm. Returns -1 if no match was found.

Details

For part.dist.match = 1, a substring of length(findTerm) is extracted from searchString, starting at position 0 in searchString until the end of searchString is reached. Each substring is matched against findTerm, and results with a maximum distance of maxdist are considered as "matching". If part.dist.match = 2, the range of the extracted substring is increased by 2, i.e. the extracted substring is two chars longer.

See Also

group_str

Examples

Run this code
string <- c("Hello", "Helo", "Hole", "Apple", "Ape", "New", "Old", "System", "Systemic")
str_pos(string, "hel")   # partial match
str_pos(string, "stem")  # partial match
str_pos(string, "R")     # no match
str_pos(string, "saste") # similarity to "System"

# finds two indices, because partial matching now
# also applies to "Systemic"
str_pos(string,
        "sytsme",
        part.dist.match = 1)

# finds nothing
str_pos("We are Sex Pistols!", "postils")
# finds partial matching of similarity
str_pos("We are Sex Pistols!", "postils", part.dist.match = 1)

Run the code above in your browser using DataLab