Learn R Programming

tosca (version 0.3-4)

filterWord: Subcorpus With Word Filter

Description

Generates a subcorpus by restricting it to texts containing specific filter words.

Usage

filterWord(...)

# S3 method for default filterWord( text, search, ignore.case = FALSE, out = c("text", "bin", "count"), ... )

# S3 method for textmeta filterWord( object, search, ignore.case = FALSE, out = c("text", "bin", "count"), filtermeta = TRUE, ... )

Value

textmeta object if object is specified, else only the filtered text. If a textmeta object is returned its meta data are filtered to those texts which appear in the corpus by default (filtermeta).

Arguments

...

Not used.

text

Not necessary if object is specified, else should be object$text: list of article texts.

search

List of data frames. Every List element is an 'or' link, every entry in a data frame is linked by an 'and'. The dataframe must have following tree variables: pattern a character string including the search terms, word, a logical value displaying if a word (TRUE) or character (search) is wanted and count an integer marking how many times the word must at least be found in the text. word can alternatively be a character string containing the keywords pattern for character search, word for word-search and left and right for truncated search. If search is only a character Vector the link is 'or', and a character search will be used with count=1

ignore.case

Logical: Lower and upper case will be ignored.

out

Type of output: text filtered corpus, bin logical vector for all texts, count the number of matches.

object

A textmeta object

filtermeta

Logical: Should the meta component be filtered, too?

Examples

Run this code
texts <- list(A="Give a Man a Fish, and You Feed Him for a Day.
Teach a Man To Fish, and You Feed Him for a Lifetime",
B="So Long, and Thanks for All the Fish",
C="A very able manipulative mathematician, Fisher enjoys a real mastery
in evaluating complicated multiple integrals.")

# search for pattern "fish"
filterWord(text=texts, search="fish", ignore.case=TRUE)

# search for word "fish"
filterWord(text=texts, search=data.frame(pattern="fish", word="word", count=1),
ignore.case=TRUE)

# pattern must appear at least two times
filterWord(text=texts, search=data.frame(pattern="fish", word="pattern", count=2),
ignore.case=TRUE)

# search for "fish" AND "day"
filterWord(text=texts, search=data.frame(pattern=c("fish", "day"), word="word", count=1),
ignore.case=TRUE)

# search for "Thanks" OR "integrals"
filterWord(text=texts, search=list(data.frame(pattern="Thanks", word="word", count=1),
data.frame(pattern="integrals", word="word", count=1)))

Run the code above in your browser using DataLab