Learn R Programming

re2 (version 0.1.3)

re2-package: re2: R interface to the Google's RE2 (C++) regular-expression library

Description

Regular expression matching can be done in two ways: using recursive backtracking or using finite automata-based techniques.

Perl, PCRE, Python, Ruby, Java, and many other languages rely on recursive backtracking for their regular expression implementations. The problem with this approach is that performance can degrade very quickly. Time complexity can be exponential. In contrast, re2 uses finite automata-based techniques for regular expression matching, guaranteeing linear time execution and a fixed stack footprint. See links to Russ Cox's excellent articles in references section.

re2 supports pearl style regular expressions (with extensions like \d, \w, \s, ...) and provides most of the functionality of PCRE -- eschewing only backreferences and look-around assertions.

Arguments

Primary re2 functions

re2 supports three types of operations on a character vector: matching (substring extraction), detection, and replacement.

Matching and substring extraction is provided by re2_match and re2_match_all. Matching regexp "(foo)|(bar)baz" on "barbazbla" will return submatches '.0' = "barbaz", '.1' = NA, and '.2' = "bar". '.0' is the entire matching text. '.1' is the first group, and so on. Groups can also be named.

re2_detect finds the presence of a pattern in a string, like grepl of base R.

re2_replace and re2_replace_all substitute matched substring with replacement string. Replacing first occurrence of pattern "b+" using replacement string "d" on text "yabba dabba doo" will result in "yada dabba doo". Replacing globally will result in "yada dada doo". re2_extract_replace functions like re2_replace except that non-matching text is ignored (not returned).

In all the above functions regexp patterns can be pre-compiled and reused. This greatly improves performance when the same regular-expression pattern is used repeatedly. See re2_regexp.

List of re2 functions :

  • re2_match

  • re2_match_all

  • re2_split

  • re2_detect

  • re2_which

  • re2_subset

  • re2_locate

  • re2_locate_all

  • re2_count

  • re2_replace

  • re2_replace_all

  • re2_extract_replace

  • re2_regexp

  • re2_get_options

Author

Girish Palya <girishji@gmail.com>

References

See Also