Search a character vector, or the content of a file or connection, for one
or more matches to an Oniguruma-compatible regular expression. Printing and
indexing methods are available for the results. ore_match
is an alias
for ore_search
.
ore_search(regex, text, all = FALSE, start = 1L, simplify = TRUE,
incremental = !all)is_orematch(x)
# S3 method for orematch
[(x, j, k, ...)
# S3 method for orematches
[(x, i, j, k, ...)
# S3 method for orematch
print(x, lines = getOption("ore.lines", 0L),
context = getOption("ore.context", 30L), width = getOption("width", 80L),
...)
# S3 method for orematches
print(x, lines = getOption("ore.lines", 0L), simplify = TRUE, ...)
For ore_search
, an "orematch"
object, or a list of
the same, each with elements
A copy of the text
element for the current match, if
it was a character vector; otherwise a single string with the content
retrieved from the file or connection. If the source was a binary file
(from ore_file(..., binary=TRUE)
) then this element will be
NULL
.
The number of matches found.
The offsets (in characters) of each match.
The offsets (in bytes) of each match.
The lengths (in characters) of each match.
The lengths (in bytes) of each match.
The matched substrings.
Equivalent metadata for each parenthesised subgroup in
regex
, in a series of matrices. If named groups are present in
the regex then dimnames
will be set appropriately.
For is_orematch
, a logical vector indicating whether the specified
object has class "orematch"
. For extraction with one index, a
vector of matched substrings. For extraction with two indices, a vector
or matrix of substrings corresponding to captured groups.
A single character string or object of class "ore"
. In
the former case, this will first be passed through ore
.
A vector of strings to match against, or a connection, or the
result of a call to ore_file
to search in a file. In the
latter case, match offsets will be relative to the file's encoding.
If TRUE
, then all matches within each element of
text
will be found. Otherwise, the search will stop at the first
match.
An optional vector of offsets (in characters) at which to start
searching. Will be recycled to the length of text
.
If TRUE
, an object of class "orematch"
will
be returned if text
is of length 1. Otherwise, a list of such
objects, with class "orematches"
, will always be returned. When
printing "orematches"
objects, this controls whether or not to omit
nonmatching elements from the output.
If TRUE
and the text
argument points to a
file, the file is read in increasingly large blocks. This can reduce
search time in large files.
An R object.
For indexing, the match number.
For indexing, the group number.
For print.orematches
, additional arguments to be passed
through to print.orematch
.
For indexing into an "orematches"
object only, the string
number.
The maximum number of lines to print. The default is zero,
meaning no limit. For "orematches"
objects this is split evenly
between the elements printed.
The number of characters of context to include either side of each match.
The number of characters in each line of printed output.
ore
for creating regex objects; matches
and groups
for an alternative to indexing for extracting
matching substrings.
# Pick out pairs of consecutive word characters
match <- ore_search("(\\w)(\\w)", "This is a test", all=TRUE)
# Find the second matched substring ("is", from "This")
match[2]
# Find the content of the second group in the second match ("s")
match[2,2]
Run the code above in your browser using DataLab