Learn R Programming

sjmisc (version 1.0.2)

group_str: Group near elements of string vectors

Description

This function groups elements of a string vector (character or string variable) according to the element's distance ('similatiry'). The more similar two string elements are, the higher is the chance to be combined into a group.

Usage

group_str(strings, maxdist = 2, method = "lv", strict = FALSE,
  trim.whitespace = TRUE, remove.empty = TRUE, showProgressBar = FALSE)

Arguments

strings
a character vector with string elements
maxdist
the maximum distance between two string elements, which is allowed to treat two elements as similar or equal.
method
Method for distance calculation. The default is "lv". See stringdist package for details.
strict
if TRUE, value matching is more strictly. See examples for details.
trim.whitespace
if TRUE (default), leading and trailing white spaces will be removed from string values.
remove.empty
if TRUE (default), empty string values will be removed from the character vector strings.
showProgressBar
If TRUE, the progress bar is displayed when computing the distance matrix. Default in FALSE, hence the bar is hidden.

Value

  • A character vector where similar string elements (values) are recoded into a new, single value.

See Also

str_pos

Examples

Run this code
library(sjPlot)
oldstring <- c("Hello", "Helo", "Hole", "Apple",
               "Ape", "New", "Old", "System", "Systemic")
newstring <- group_str(oldstring)
sjt.frq(data.frame(oldstring, newstring),
        removeStringVectors = FALSE,
        autoGroupStrings = FALSE)

newstring <- group_str(oldstring, strict = TRUE)
sjt.frq(data.frame(oldstring, newstring),
        removeStringVectors = FALSE,
        autoGroupStrings = FALSE)

Run the code above in your browser using DataLab