gregexpr: Extended `gregexpr` with substring retrieval

Description

An extension of the function base::gregexpr enabling retrieval of the matching substrings.

Usage

gregexpr(
  pattern,
  text,
  ignore.case = FALSE,
  perl = FALSE,
  fixed = FALSE,
  useBytes = FALSE,
  extract = FALSE
)

Value

It will either return what the base::gregexpr would (extract = FALSE) or a list

of substrings matching the pattern (extract = TRUE). There is one list element for each string in text, and each list element contains a character vector of all matching substrings in the corresponding entry of text.

Arguments

pattern: Character string containing a regular expression (or character string for fixed = TRUE) to be matched in the given character vector. Coerced by as.character to a character string if possible. If a character vector of length 2 or more is supplied, the first element is used with a warning. Missing values are not allowed.
text: A character vector where matches are sought, or an object which can be coerced by as.character to a character vector.
ignore.case: If FALSE, the pattern matching is case sensitive and if TRUE, case is ignored during matching.
perl: Logical. Should perl-compatible regexps be used? Has priority over extended.
fixed: Logical. If TRUE, pattern is a string to be matched as is. Overrides all conflicting arguments.
useBytes: Logical. If TRUE the matching is done byte-by-byte rather than character-by-character. See grep for details.
extract: Logical indicating if matching substrings should be extracted and returned.

Author

Lars Snipen and Kristian Liland.

Details

Extended version of base:gregexpr that enables the return of the substrings matching the pattern. The last argument extract is the only difference to base::gregexpr. The default behaviour is identical to base::gregexpr, but setting extract=TRUE means the matching substrings are returned.

Examples

Run this code

sequences <- c("ACATGTCATGTCC", "CTTGTATGCTG")
gregexpr("ATG", sequences, extract = TRUE)

Run the code above in your browser using DataLab