Regular expression matching can be done in two ways: using recursive backtracking or using finite automata-based techniques.
Perl, PCRE, Python, Ruby, Java, and many other languages rely on recursive backtracking for their regular expression implementations. The problem with this approach is that performance can degrade very quickly. Time complexity can be exponential. In contrast, re2 uses finite automata-based techniques for regular expression matching, guaranteeing linear time execution and a fixed stack footprint. See links to Russ Cox's excellent articles in references section.
re2 supports pearl style regular expressions (with extensions like \d, \w, \s, ...) and provides most of the functionality of PCRE -- eschewing only backreferences and look-around assertions.
re2 supports three types of operations on a character vector: matching (substring extraction), detection, and replacement.
Matching and substring extraction is provided by re2_match
and
re2_match_all
.
Matching regexp "(foo)|(bar)baz" on "barbazbla" will return
submatches '.0' = "barbaz", '.1' = NA, and '.2' = "bar". '.0' is
the entire matching text. '.1' is the first group, and so
on. Groups can also be named.
re2_detect
finds the presence of a pattern in a string, like
grepl
of base R.
re2_replace
and re2_replace_all
substitute
matched substring with replacement string. Replacing first occurrence of
pattern "b+" using replacement string "d" on text "yabba dabba doo"
will result in "yada dabba doo". Replacing globally will result in
"yada dada doo". re2_extract_replace
functions like
re2_replace
except that
non-matching text is ignored (not returned).
In all the above functions regexp patterns can be pre-compiled and
reused. This greatly improves performance when the same regular-expression
pattern is used repeatedly. See re2_regexp
.
List of re2 functions :
re2_match
re2_match_all
re2_split
re2_detect
re2_which
re2_subset
re2_locate
re2_locate_all
re2_count
re2_replace
re2_replace_all
re2_extract_replace
re2_regexp
re2_get_options
Girish Palya <girishji@gmail.com>
Regular Expression Matching Can Be Simple And Fast https://swtch.com/~rsc/regexp/regexp1.html
Regular Expression Matching: the Virtual Machine Approach https://swtch.com/~rsc/regexp/regexp2.html
Regular Expression Matching in the Wild https://swtch.com/~rsc/regexp/regexp3.html
RE2 Syntax https://github.com/google/re2/wiki/Syntax
RE2 C++ source https://github.com/google/re2
R source of RE2 https://github.com/girishji/re2
Useful links: