Learn R Programming

sentometrics (version 0.2)

ctr_model: Set up control for sentiment measures-based regression modelling

Description

Sets up control object for linear or nonlinear modelling of a response variable onto a large panel of textual sentiment measures (and potentially other variables). See sento_model for details on the estimation and calibration procedure.

Usage

ctr_model(model = c("gaussian", "binomial", "multinomial"), type = c("BIC",
  "AIC", "Cp", "cv"), intercept = TRUE, do.iter = FALSE, h = 0,
  alphas = seq(0, 1, by = 0.2), nSample = NULL, trainWindow = NULL,
  testWindow = NULL, oos = 0, start = 1, do.progress = TRUE,
  do.parallel = FALSE)

Arguments

model

a character vector with one of the following: "gaussian" (linear regression), "binomial" (binomial logistic regression), or "multinomial" (multinomial logistic regression).

type

a character vector indicating which model calibration approach to use. Supports "BIC", "AIC" and "Cp" (Mallows's Cp) as sparse regression adapted information criteria (cf., ``On the `degrees of freedom' of the LASSO''; Zou, Hastie, Tibshirani et al., 2007), and "cv" (cross-validation based on the train function from the caret package). The adapted information criteria are currently only available for a linear regression.

intercept

a logical, TRUE by default fits an intercept.

do.iter

a logical, TRUE induces an iterative estimation of models at the given nSample size and performs the associated one-step ahead out-of-sample prediction exercise through time.

h

an integer value that shifts the time series to have the desired prediction setup; h = 0 means no change to the input data (nowcasting assuming data is aligned properly), h > 0 shifts the dependent variable by h periods (i.e. rows) further in time (forecasting), h < 0 shifts the independent variables by h periods.

alphas

a numeric vector of the different alphas to test for during calibration, between 0 and 1. A value of 0 pertains to Ridge regression, a value of 1 to LASSO regression; values in between are pure elastic net. The lambda values tested for are chosen by the glmnet function or set to 10^seq(2, -2, length.out = 100) in case of cross-validation.

nSample

a positive integer as the size of the sample for model estimation at every iteration (ignored if iter = FALSE).

trainWindow

a positive integer as the size of the training sample in cross-validation (ignored if type != "cv").

testWindow

a positive integer as the size of the test sample in cross-validation (ignored if type != "cv").

oos

a non-negative integer to indicate the number of periods to skip from the end of the cross-validation training sample (out-of-sample) up to the test sample (ignored if type != "cv").

start

a positive integer to indicate at which point the iteration has to start (ignored if iter = FALSE). For example, given 100 possible iterations, start = 70 leads to model estimations only for the last 31 samples.

do.progress

a logical, if TRUE progress statements are displayed during model calibration.

do.parallel

a logical, if TRUE the %dopar% construct from the foreach package is applied for iterative model estimation. A proper parallel backend needs to be set up to make it work. No progress statements are displayed whatsoever when TRUE. For cross-validation models, parallelization can also be carried out for single-run models, whenever a parallel backend is set up. See the examples in sento_model.

Value

A list encapsulating the control parameters.

See Also

sento_model

Examples

Run this code
# NOT RUN {
# information criterion based model control functions
ctrIC1 <- ctr_model(model = "gaussian", type = "BIC", do.iter = FALSE, h = 0,
                    alphas = seq(0, 1, by = 0.10))
ctrIC2 <- ctr_model(model = "gaussian", type = "AIC", do.iter = TRUE, h = 0, nSample = 100)

# cross-validation based model control functions
ctrCV1 <- ctr_model(model = "gaussian", type = "cv", do.iter = FALSE, h = 0, trainWindow = 250,
                    testWindow = 4, oos = 0, do.progress = TRUE)
ctrCV2 <- ctr_model(model = "binomial", type = "cv", h = 0, trainWindow = 250,
                    testWindow = 4, oos = 0, do.progress = TRUE)
ctrCV3 <- ctr_model(model = "multinomial", type = "cv", h = 0, trainWindow = 250,
                    testWindow = 4, oos = 0, do.progress = TRUE)
ctrCV4 <- ctr_model(model = "gaussian", type = "cv", do.iter = TRUE, h = 0, trainWindow = 45,
                    testWindow = 4, oos = 0, nSample = 70, do.progress = TRUE)

# }

Run the code above in your browser using DataLab