Learn R Programming

re2 (version 0.1.3)

re2_match: Extract matched groups from a string

Description

Vectorized over string and pattern. Match against a string using a regular expression and extract matched substrings. re2_match extracts first matched substring, and re2_match_all extracts all matches.

Matching regexp "(foo)|(bar)baz" on "barbazbla" will return submatches '.0' = "barbaz", '.1' = NA, and '.2' = "bar". '.0' is the entire matching text. '.1' is the first group, and so on. Groups can also be named.

Usage

re2_match(string, pattern, simplify = TRUE)

re2_match_all(string, pattern)

Value

In case of re2_match a character matrix. First column is the entire matching text, followed by one column for each capture group. If simplify is FALSE, returns a list of named character vectors.

In case of re2_match_all, returns a list of character matrices.

Arguments

string

A character vector, or an object which can be coerced to one.

pattern

Character string containing a regular expression, or a pre-compiled regular expression (or a vector of character strings and pre-compiled regular expressions).
See re2_regexp for available options.
See re2_syntax for regular expression syntax.

simplify

If TRUE, the default, returns a character matrix. If FALSE, returns a list. Not applicable to re2_match_all.

See Also

re2_regexp for options to regular expression, re2_syntax for regular expression syntax.

Examples

Run this code
## Substring extraction
strings <- c("barbazbla", "foobar")
pattern <- "(foo)|(?Pbar)baz"

re2_match(strings, pattern)
result <- re2_match(strings, pattern)
is.matrix(result)

re2_match(strings, pattern, simplify = FALSE)
result <- re2_match(strings, pattern, simplify = FALSE)
is.list(result)

## Compile regexp
re <- re2_regexp("(foo)|(BaR)baz", case_sensitive = FALSE)
re2_match(strings, re)

strings <- c(
  "Home: 743 733 5365", "373-733-5753 ", "foobar",
  "733.335.3457 and Work: 573-433-7577 "
)
re <- re2_regexp("([0-9]{3})[- .]([0-9]{3})[- .]([0-9]{4})")
re2_match(strings, re)

## Vectorized over patterns
re2_match(strings, c(re, "53 $", "^foo", re))

## Match all occurances, not just the first
re2_match_all(strings, re)
re2_match_all("ruby:1234 68 red:92 blue:", "(\\w+):(\\d+)")

## Vectorized over patterns (matching all occurances)
re2_match_all(strings, c(re, "53 $", "^foo", re))

Run the code above in your browser using DataLab