ctr_agg: Set up control for aggregation into sentiment measures

Description

Sets up control object for aggregation of document-level textual sentiment into textual sentiment measures (indices).

Usage

ctr_agg(howWithin = "proportional", howDocs = "equal_weight",
  howTime = "equal_weight", do.ignoreZeros = TRUE, by = "day", lag = 1L,
  fill = "zero", alphasExp = seq(0.1, 0.5, by = 0.1), ordersAlm = 1:3,
  do.inverseAlm = TRUE, do.normalizeAlm = TRUE, weights = NULL,
  dfm = NULL)

Arguments

howWithin

a single character vector defining how aggregation within documents will be performed. Should length(howWithin) > 1, the first element is used. For currently available options on how aggregation can occur, see get_hows()$words.

howDocs

a single character vector defining how aggregation across documents per date will be performed. Should length(howDocs) > 1, the first element is used. For currently available options on how aggregation can occur, see get_hows()$docs.

howTime

a character vector defining how aggregation across dates will be performed. More than one choice is possible. For currently available options on how aggregation can occur, see get_hows()$time.

do.ignoreZeros

a logical indicating whether zero sentiment values have to be ignored in the determination of the document weights while aggregating across documents. By default do.ignoreZeros = TRUE, such that documents with an exact score of zero are considered irrelevant.

a single character vector, either "day", "week", "month" or "year", to indicate at what level the dates should be aggregated. Dates are displayed as the first day of the period, if applicable (e.g. "2017-03-01" for March 2017).

lag

a single integer vector, being the time lag to be specified for aggregation across time. By default equal to 1L, meaning no aggregation across time.

fill

a single character vector, one of c("zero", "latest", "none"), to control how missing sentiment values across the continuum of dates considered are added. This impacts the aggregation across time, applying the fill_measures function before aggregating, except if fill = "none". By default equal to "zero", which sets the scores (and thus also the weights) of the added dates to zero in the time aggregation.

alphasExp

a numeric vector of all exponential smoothing factors to calculate weights for, used if "exponential" %in% howTime. Values should be between 0 and 1 (both excluded).

ordersAlm

a numeric vector of all Almon polynomial orders to calculate weights for, used if "almon" %in% howTime.

do.inverseAlm

a logical indicating if for every Almon polynomial its inverse has to be added, used if "almon" %in% howTime.

do.normalizeAlm

a logical indicating if every Almon polynomial weights column should sum to one, used if "almon" %in% howTime.

weights

an optional own weighting scheme, always used if provided as a data.frame with the number of rows equal to the desired lag. The automatic Almon polynomials are created sequentially; if the user wants only specific of such time weighting series it can use almons, select the columns it requires, combine it into a data.frame and supply it under this argument (see examples).

dfm

optional; see compute_sentiment.

Value

A list encapsulating the control parameters.

Details

For currently available options on how aggregation can occur (via the howWithin, howDocs and howTime arguments), call get_hows.

Examples

Run this code

# NOT RUN {
# simple control function
ctr1 <- ctr_agg(howTime = "linear", by = "year", lag = 3)

# more elaborate control function (particular attention to time weighting schemes)
ctr2 <- ctr_agg(howWithin = "tf-idf",
                howDocs = "proportional",
                howTime = c("equal_weight", "linear", "almon", "exponential", "own"),
                do.ignoreZeros = TRUE,
                by = "day",
                lag = 20,
                ordersAlm = 1:3,
                do.inverseAlm = TRUE,
                do.normalizeAlm = TRUE,
                alphasExp = c(0.20, 0.50, 0.70, 0.95),
                weights = data.frame(myWeights = runif(20)))

# set up control function with one linear and two chosen Almon weighting schemes
a <- almons(n = 70, orders = 1:3, do.inverse = TRUE, do.normalize = TRUE)
ctr3 <- ctr_agg(howTime = c("linear", "own"), by = "year", lag = 70,
                weights = data.frame(a1 = a[, 1], a2 = a[, 3]))

# }

Run the code above in your browser using DataLab