ore_search: Search for matches to a regular expression

Description

Search a character vector, or the content of a file or connection, for one or more matches to an Oniguruma-compatible regular expression. Printing and indexing methods are available for the results. ore_match is an alias for ore_search.

Usage

ore_search(regex, text, all = FALSE, start = 1L, simplify = TRUE,
  incremental = !all)
is_orematch(x)
# S3 method for orematch
[(x, j, k, ...)
# S3 method for orematches
[(x, i, j, k, ...)
# S3 method for orematch
print(x, lines = getOption("ore.lines", 0L),
  context = getOption("ore.context", 30L), width = getOption("width", 80L),
  ...)
# S3 method for orematches
print(x, lines = getOption("ore.lines", 0L), simplify = TRUE, ...)

Value

For ore_search, an "orematch" object, or a list of the same, each with elements

text: A copy of the text element for the current match, if it was a character vector; otherwise a single string with the content retrieved from the file or connection. If the source was a binary file (from ore_file(..., binary=TRUE)) then this element will be NULL.
nMatches: The number of matches found.
offsets: The offsets (in characters) of each match.
byteOffsets: The offsets (in bytes) of each match.
lengths: The lengths (in characters) of each match.
byteLengths: The lengths (in bytes) of each match.
matches: The matched substrings.
groups: Equivalent metadata for each parenthesised subgroup in regex, in a series of matrices. If named groups are present in the regex then dimnames will be set appropriately.

For is_orematch, a logical vector indicating whether the specified object has class "orematch". For extraction with one index, a vector of matched substrings. For extraction with two indices, a vector or matrix of substrings corresponding to captured groups.

Arguments

regex: A single character string or object of class "ore". In the former case, this will first be passed through ore.
text: A vector of strings to match against, or a connection, or the result of a call to ore_file to search in a file. In the latter case, match offsets will be relative to the file's encoding.
all: If TRUE, then all matches within each element of text will be found. Otherwise, the search will stop at the first match.
start: An optional vector of offsets (in characters) at which to start searching. Will be recycled to the length of text.
simplify: If TRUE, an object of class "orematch" will be returned if text is of length 1. Otherwise, a list of such objects, with class "orematches", will always be returned. When printing "orematches" objects, this controls whether or not to omit nonmatching elements from the output.
incremental: If TRUE and the text argument points to a file, the file is read in increasingly large blocks. This can reduce search time in large files.
x: An R object.
j: For indexing, the match number.
k: For indexing, the group number.
...: For print.orematches, additional arguments to be passed through to print.orematch.
i: For indexing into an "orematches" object only, the string number.
lines: The maximum number of lines to print. The default is zero, meaning no limit. For "orematches" objects this is split evenly between the elements printed.
context: The number of characters of context to include either side of each match.
width: The number of characters in each line of printed output.

Examples

Run this code

# Pick out pairs of consecutive word characters
match <- ore_search("(\\w)(\\w)", "This is a test", all=TRUE)

# Find the second matched substring ("is", from "This")
match[2]

# Find the content of the second group in the second match ("s")
match[2,2]

Run the code above in your browser using DataLab

Description

Usage

Value

Arguments

See Also

Examples