Learn R Programming

corpus (version 0.8.0)

text_locate: Searching for terms in text.

Description

Look for instances of one or more terms in a set of texts.

Usage

text_count(x, terms, filter = text_filter(x))

text_detect(x, terms, filter = text_filter(x)) text_locate(x, terms, filter = text_filter(x))

text_subset(x, terms, filter = text_filter(x))

Arguments

x

a text or character vector.

terms

a character vector of search terms.

filter

a text filter defining the token boundaries.

Value

text_count and text_detect return a numeric vector and a logical vector, respectively, with length equal to the number of input texts and names equal to the text names.

text_locate returns a data frame with one row for each search result and columns named ‘text’, ‘term’, ‘before’, ‘instance’, and ‘after’. The ‘text’ column gives the name of the text containing the instance, and ‘term’ gives the matching term; ‘before’ and ‘after’ are text vectors giving the text before and after the instance. The ‘instance’ column gives the token or tokens matching the search term.

text_subset returns the subset of texts that contain the given search terms. The resulting has its text_filter set to the passed-in filter argument.

Details

text_count counts the number of search term instances in each element of the text vector.

text_detect indicates whether each text contains at least one of the search terms.

text_locate finds all instances of the search terms in the input text.

text_subset returns the texts that contain the search terms.

See Also

term_counts, term_frame.

Examples

Run this code
# NOT RUN {
    text <- c("Rose is a rose is a rose is a rose.",
              "A rose by any other name would smell as sweet.",
              "Snow White and Rose Red")

    text_count(text, "rose")
    text_detect(text, "rose")
    text_locate(text, "rose")
    text_subset(text, "a rose")

    # search for multiple terms
    text_locate(text, c("rose", "rose red", "snow white"))
# }

Run the code above in your browser using DataLab