Learn R Programming

RNewsflow (version 1.2.4)

term.day.dist: Calculate statistics for term occurence across days

Description

Calculate statistics for term occurence across days

Usage

term.day.dist(dtm, meta = NULL, date.var = "date")

Arguments

dtm

A quanteda dfm. Alternatively, a DocumentTermMatrix from the tm package can be used, but then the meta parameter needs to be specified manually

meta

If dtm is a quanteda dfm, docvars(meta) is used by default (meta is NULL) to obtain the meta data. Otherwise, the meta data.frame has to be given by the user, with the rows of the meta data.frame matching the rows of the dtm (i.e. each row is a document)

date.var

The name of the meta column specifying the document date. default is "date". The values should be of type POSIXlt or POSIXct

Value

A data.frame with statistics for each term.

  • freq: The number of times a term occurred

  • doc.freq: The number of documents in which a term occured

  • days.n: The number of days on which a term occured

  • days.pct: The percentage of days on which a term occured

  • days.entropy: The entropy of the distribution of term frequency across days

  • days.entropy.norm: The normalized days.entropy, where 1 is a discrete uniform distribution

Examples

Run this code
# NOT RUN {
tdd = term.day.dist(rnewsflow_dfm, date.var='date')
head(tdd)
tail(tdd)
# }

Run the code above in your browser using DataLab