Learn R Programming

quanteda (version 0.9.9-3)

kwic: locate keywords-in-context

Description

For a text or a collection of texts (in a quanteda corpus object), return a list of a keyword supplied by the user in its immediate context, identifying the source text and the word index number within the source text. (Not the line number, since the text may or may not be segmented using end-of-line delimiters.)

Usage

kwic(x, keywords, window = 5, valuetype = c("glob", "regex", "fixed"), case_insensitive = TRUE, ..., new = TRUE)
is.kwic(x)
as.kwic(x)

Arguments

x
a character, corpus, or tokens object
keywords
a keyword pattern or phrase consisting of multiple keyword patterns, possibly including punctuation. If a phrase, keywords will be tokenized using the ... options.
window
the number of context words to be displayed around the keyword.
valuetype
how to interpret keyword expressions: "glob" for "glob"-style wildcard expressions; "regex" for regular expressions; or "fixed" for exact matching. See valuetype for details.
case_insensitive
match without respect to case if TRUE
...
additional arguments passed to tokens, for applicable object types
new
logical; if TRUE use the newer kwic, if FALSE then call kwic_old. Once the full testing of the newer kwic method is complete and the transition declared successful, we will delete this option and delete kwic_old.

Value

A kwic object classed data.frame, with the document name (docname), the token index position (position), the context before (contextPre), the keyword in its original format (keyword, preserving case and attached punctuation), and the context after (contextPost).

Details

as.kwic is a temporary function to convert a "kwic2" to a standard "kwic" object.

Examples

Run this code
head(kwic(data_char_inaugural, "secure*", window = 3, valuetype = "glob"))
head(kwic(data_char_inaugural, "secur", window = 3, valuetype = "regex"))
head(kwic(data_char_inaugural, "security", window = 3, valuetype = "fixed"))

kwic(data_corpus_inaugural, "war against")
kwic(data_corpus_inaugural, "war against", valuetype = "regex")

mykwic <- kwic(data_corpus_inaugural, "provident*")
is.kwic(mykwic)
is.kwic("Not a kwic")
# as.kwic examples
txt <- c("This is a test",
         "This is it.",
         "What is in a train?",
         "Is it a question?",
         "Sometimes you don't know if this is it.",
         "Is it a bird or a plane or is it a train?")

toks <- tokens(txt)
(kwOld <- kwic(toks, "is it", new = FALSE))
(kwNew <- kwic(toks, "is it", new = TRUE))
## Not run: 
# # this breaks - need to harmonize print methods
# as.kwic(kwNew)
# ## End(Not run)

Run the code above in your browser using DataLab