regexpr2 and gregexpr2 locate, respectively, first and all
(i.e., globally) occurrences of a pattern.
regexec2 and gregexec2 can additionally
pinpoint the matches to parenthesised subexpressions (regex capture groups).
regexpr2(x, pattern, ..., ignore_case = FALSE, fixed = FALSE)gregexpr2(x, pattern, ..., ignore_case = FALSE, fixed = FALSE)
regexec2(x, pattern, ..., ignore_case = FALSE, fixed = FALSE)
gregexec2(x, pattern, ..., ignore_case = FALSE, fixed = FALSE)
regexpr(
  pattern,
  x = text,
  ...,
  ignore.case = FALSE,
  fixed = FALSE,
  perl = FALSE,
  useBytes = FALSE,
  text
)
gregexpr(
  pattern,
  x = text,
  ...,
  ignore.case = FALSE,
  fixed = FALSE,
  perl = FALSE,
  useBytes = FALSE,
  text
)
regexec(
  pattern,
  x = text,
  ...,
  ignore.case = FALSE,
  fixed = FALSE,
  perl = FALSE,
  useBytes = FALSE,
  text
)
gregexec(
  pattern,
  x = text,
  ...,
  ignore.case = FALSE,
  fixed = FALSE,
  perl = FALSE,
  useBytes = FALSE,
  text
)
regexpr2 and [DEPRECATED] regexpr return an integer vector
which gives the start positions of the first substrings matching a pattern.
The match.length attribute gives the corresponding
match lengths. If there is no match, the two values are set to -1.
gregexpr2 and [DEPRECATED] gregexpr yield
a list whose elements are integer vectors with match.length
attributes, giving the positions of all the matches.
For consistency with regexpr2, a no-match is denoted with
a single -1, hence the output is guaranteed to consist of non-empty integer
vectors.
regexec2 and [DEPRECATED] regexec return
a list of integer vectors giving the positions of the first matches
and the locations of matches to the consecutive parenthesised subexpressions
(which can only be recognised if fixed=FALSE).
Each vector is equipped with the match.length attribute.
gregexec2 and [DEPRECATED] gregexec generate
a list of matrices, where each column corresponds to a separate match;
the first row is the start index of the match, the second row gives the
position of the first captured group, and so forth.
Their match.length attributes are matrices of corresponding sizes.
These functions preserve the attributes of the longest inputs (unless they are dropped due to coercion). Missing values in the inputs are propagated consistently.
character vector whose elements are to be examined
character vector of nonempty search patterns
further arguments to stri_locate,
e.g., omit_empty, locale, dotall
single logical value; indicates whether matching should be case-insensitive
single logical value;
FALSE for matching with regular expressions
    (see about_search_regex);
TRUE for fixed pattern matching
    (about_search_fixed);
NA for the Unicode collation algorithm
    (about_search_coll)
not used (with a warning if attempting to do so) [DEPRECATED]
alias to the x argument [DEPRECATED]
Replacements for base gregexpr (and others)
implemented with stri_locate.
there are inconsistencies between the argument order and naming
    in grepl, strsplit,
    and startsWith (amongst others); e.g.,
    where the needle can precede the haystack, the use of the forward
    pipe operator, |>, is less convenient
    [fixed here]
base R implementation is not portable as it is based on
    the system PCRE or TRE library
    (e.g., some Unicode classes may not be available or matching thereof
    can depend on the current LC_CTYPE category
    [fixed here]
not suitable for natural language processing
    [fixed here -- use fixed=NA]
two different regular expression libraries are used
    (and historically, ERE was used in place of TRE)
    [here, ICU Java-like regular expression engine
    is only available, hence the perl argument has no meaning]
not vectorised w.r.t. pattern
    [fixed here]
ignore.case=TRUE cannot be used with fixed=TRUE
    [fixed here]
no attributes are preserved [fixed here; see Value]
in regexec, match.length attribute is unnamed
    even if the capture groups are (but gregexec sets dimnames
    of both start positions and lengths)
    [fixed here]
regexec and gregexec with fixed other than
    FALSE make little sense.
    [this argument is [DEPRECATED] in regexec2
    and gregexec2]
gregexec does not always yield a list of matrices
    [fixed here]
a no-match to a conditional capture group is assigned length 0 [fixed here]
no-matches result in a single -1, even if capture groups are defined in the pattern [fixed here]
These functions are fully vectorised with respect to both x and
pattern.
Use substrl and gsubstrl
to extract or replace the identified chunks.
Also, consider using regextr2 and
gregextr2 directly instead.
The official online manual of stringx at https://stringx.gagolewski.com/
Related function(s): paste, nchar,
    strsplit, gsub2,
    grepl2, gregextr2, gsubstrl
x <- c(aca1="acacaca", aca2="gaca", noaca="actgggca", na=NA)
regexpr2(x, "(A)[ACTG]\\1", ignore_case=TRUE)
regexpr2(x, "aca") >= 0  # like grepl2
gregexpr2(x, "aca", fixed=TRUE, overlap=TRUE)
# two named capture groups:
regexec2(x, "(?a)(?cac?)")
gregexec2(x, "(?a)(?cac?)")
# extraction:
gsubstrl(x, gregexpr2(x, "(A)[ACTG]\\1", ignore_case=TRUE))
gregextr2(x, "(A)[ACTG]\\1", ignore_case=TRUE)  # equivalent
Run the code above in your browser using DataLab