Learn R Programming

qdap (version 0.2.2)

termco: Search For and Count Terms


termco - Search a transcript by any number of grouping variables for categories (themes) of grouped root terms. While there are other termco functions in the termco family (e.g., termco.d) termco is a more powerful and flexible wrapper intended for general use. termco.d - Search a transcript by any number of grouping variables for root terms. term.match - Search a transcript for words that exactly match term(s). termco2mat - Convert a termco dataframe to a matrix for use with visualization functions (e.g., heatmap.2).


termco(text.var, grouping.var = NULL, match.list,
    short.term = TRUE, ignore.case = TRUE, elim.old = TRUE,
    percent = TRUE, digits = 2, apostrophe.remove = FALSE,
    char.keep = NULL, digit.remove = NULL,
    zero.replace = 0, ...)

  termco.d(text.var, grouping.var = NULL, match.string,
    short.term = FALSE, ignore.case = TRUE,
    zero.replace = 0, percent = TRUE, digits = 2,
    apostrophe.remove = FALSE, char.keep = NULL,
    digit.remove = TRUE, ...)

  term.match(text.var, terms, return.list = TRUE,
    apostrophe.remove = FALSE)

  termco2mat(dataframe, drop.wc = TRUE, short.term = TRUE,
    rm.zerocol = FALSE, no.quote = TRUE, transform = TRUE,
    trim.terms = TRUE)


The text variable.
The grouping variables. Default NULL generates one word list for all text. Also takes a single grouping variable or a list of 1 or more grouping variables.
A list of named character vectors.
logical. If TRUE column names are trimmed versions of the match list, otherwise the terms are wrapped with 'term(phrase)'
logical. If TRUE case is ignored.
logical. If TRUE eliminates the columns that are combined together by the named match.list.
logical. If TRUE output given as percent. If FALSE the output is proportion.
Integer; number of decimal places to round when printing.
logical. If TRUE removes apostrophes from the text before examining.
A character vector of symbol character (i.e., punctuation) that strip should keep. The default is to strip everything except apostrophes. termco attempts to auto detect characters to keep ba
logical. If TRUE strips digits from the text before counting. termco attempts to auto detect if digits should be retained based on the elements in match.list.
Value to replace 0 values with.
Other argument supplied to strip.
A vector of terms to search for. When using inside of term.match the term(s) must be words or partial words but do not have to be when using termco.d (i.e., they can be phrases
The terms to search for in the text.var. Similar to match.list but these terms must be words or partial words rather than multiple words and symbols.
logical. If TRUE returns the output for multiple terms as a list by term rather than a vector.
A termco (or termco.d) dataframe or object.
logical. If TRUE the word count column will be dropped.
logical. If TRUE any column containing all zeros will be removed from the matrix.
logical. If TRUE the matrix will be printed without quotes if it's character.
logical. If TRUE the matrix will be transformed.
logical. If TRUE trims the column header/names to ensure there is not a problem with spacing when using in other R functions.


  • termco & termco.d - both return a list, of class "termco.d", of data frames and information regarding word counts:
  • rawraw word counts by grouping variable
  • propproportional word counts by grouping variable; proportional to each individual's word use
  • rnpa character combination data frame of raw and proportional
  • zero_replacevalue to replace zeros with; mostly internal use
  • percentThe value of percent used for plotting purposes.
  • digitsinteger value of number of digits to display; mostly internal use
  • term.match - returns a list or vector of possible words that match term(s). termco2mat - returns a matrix of term counts.


Percentages are calculated as a ratio of counts of match.list elements to word counts. Word counts do not contain symbols or digits. Using symbols, digits or small segments of full words (e.g., "to") could total more than 100%.

#termco examples:

term <- c("the ", "she", " wh")
with(raj.act.1,  termco(dialogue, person, term))
# General form for match.list as themes
# ml <- list(
#     cat1 = c(),
#     cat2 = c(),
#     catn = c()
# )

ml <- list(
    cat1 = c(" the ", " a ", " an "),
    cat2 = c(" I'" ),
    the = c("the", " the ", " the", "the")

(dat <- with(raj.act.1,  termco(dialogue, person, ml)))
dat$rnp  #useful for presenting in tables
dat$raw  #prop and raw are useful for performing calculations
datb <- with(raj.act.1, termco(dialogue, person, ml,
    short.term = FALSE, elim.old=FALSE))
ltruncdf(datb, 20, 6)

(dat2 <- data.frame(dialogue=c("@bryan is bryan good @br",
    "indeed", "@ brian"), person=qcv(A, B, A)))

ml2 <- list(wrds=c("bryan", "indeed"), "@", bryan=c("bryan", "@ br", "@br"))

with(dat2, termco(dialogue, person, match.list=ml2))

with(dat2, termco(dialogue, person, match.list=ml2, percent = FALSE))

DATA$state[1] <- "12 4 rgfr  r0ffrg0"
termco(DATA$state, DATA$person, '0', digit.remove=FALSE)
DATA <- qdap::DATA

#Using with term.match and exclude
exclude(term.match(DATA$state, qcv(th), FALSE), "truth")
termco(DATA$state, DATA$person, exclude(term.match(DATA$state, qcv(th),
    FALSE), "truth"))
MTCH.LST <- exclude(term.match(DATA$state, qcv(th, i)), qcv(truth, stinks))
termco(DATA$state, DATA$person, MTCH.LST)

syns <- synonyms("doubt")
termco(DATA$state, DATA$person, unlist(syns[1]))
synonyms("doubt", FALSE)
termco(DATA$state, DATA$person, list(doubt = synonyms("doubt", FALSE)))
termco(DATA$state, DATA$person, syns)

#termco.d examples:
termco.d(DATA$state, DATA$person, c(" the", " i'"))
termco.d(DATA$state, DATA$person, c(" the", " i'"), ignore.case=FALSE)
termco.d(DATA$state, DATA$person, c(" the ", " i'"))

# termco2mat example:
MTCH.LST <- exclude(term.match(DATA$state, qcv(a, i)), qcv(is, it, am, shall))
termco_obj <- termco(DATA$state, DATA$person, MTCH.LST)
plot(termco_obj, label = TRUE)
plot(termco_obj, label = TRUE, text.color = "red")
plot(termco_obj, label = TRUE, text.color="red", lab.digits=3)

