Learn R Programming

qdap (version 0.2.5)

sentSplit: Sentence Splitting

Description

sentSplit - Splits turns of talk into individual sentences (provided proper punctuation is used). This procedure is usually done as part of the data read in and cleaning process. sentCombine - Combines sentences by the same grouping variable together. TOT - Convert the tot column from sentSplit to turn of talk index (no sub sentence). Generally, for internal use.

Usage

sentSplit(dataframe, text.var,
    endmarks = c("?", ".", "!", "|"),
    incomplete.sub = TRUE, rm.bracket = TRUE,
    stem.col = FALSE, text.place = "right", ...)

  sentCombine(text.var, grouping.var = NULL,
    as.list = FALSE)

  TOT(tot)

Arguments

dataframe
A dataframe that contains the person and text variable.
text.var
The text variable.
endmarks
A character vector of endmarks to split turns of talk into sentences.
incomplete.sub
logical. If TRUE detects incomplete sentences and replaces with "|".
rm.bracket
logical. If TRUE removes brackets from the text.
stem.col
logical. If TRUE stems the text as a new column.
text.place
A character string giving placement location of the text column. This must be one of the strings "original", "right" or "left".
...
Additional options passed to stem2df.
grouping.var
The grouping variables. Default NULL generates one word list for all text. Also takes a single grouping variable or a list of 1 or more grouping variables.
tot
A tot column from a sentSplit output.
as.list
logical. If TRUE returns the output as a list. If FALSE the output is returned as a dataframe.

Value

  • sentSplit - returns a dataframe with turn of talk broken apart into sentences. Optionally a stemmed version of the text variable may be returned as well. sentCombine - returns a list of vectors with the continuous sentences by grouping.var pasted together. returned as well. TOT - returns a numeric vector of the turns of talk without sentence sub indexing (e.g. 3.2 become 3).

See Also

bracketX, incomplete.replace, stem2df , TOT

Examples

Run this code
#sentSplit EXAMPLE:
sentSplit(DATA, "state")
sentSplit(DATA, "state", stem.col = TRUE)
sentSplit(DATA, "state", text.place = "left")
sentSplit(DATA, "state", text.place = "original")
sentSplit(raj, "dialogue")[1:20, ]

#sentCombine EXAMPLE:
dat <- sentSplit(DATA, "state")
sentCombine(dat$state, dat$person)
truncdf(sentCombine(dat$state, dat$sex), 50)

#TOT EXAMPLE:
dat <- sentSplit(DATA, "state")
TOT(dat$tot)

Run the code above in your browser using DataLab