gum: Generalised Univariate Model

Description

Function constructs Generalised Univariate Model, estimating matrices F, w, vector g and initial parameters.

Usage

gum(data, orders = c(1, 1), lags = c(1, frequency(data)),
  type = c("additive", "multiplicative"), formula = NULL,
  regressors = c("use", "select", "adapt", "integrate"),
  initial = c("backcasting", "optimal", "complete"), persistence = NULL,
  transition = NULL, measurement = rep(1, sum(orders)),
  loss = c("likelihood", "MSE", "MAE", "HAM", "MSEh", "TMSE", "GTMSE",
  "MSCE"), h = 0, holdout = FALSE, bounds = c("admissible", "none"),
  silent = TRUE, model = NULL, ...)
auto.gum(data, orders = 3, lags = frequency(data), type = c("additive",
  "multiplicative", "select"), formula = NULL, regressors = c("use",
  "select", "adapt", "integrate"), initial = c("backcasting", "optimal",
  "complete"), ic = c("AICc", "AIC", "BIC", "BICc"), loss = c("likelihood",
  "MSE", "MAE", "HAM", "MSEh", "TMSE", "GTMSE", "MSCE"), h = 0,
  holdout = FALSE, bounds = c("admissible", "none"), silent = TRUE, ...)
gum_old(data, orders = c(1, 1), lags = c(1, frequency(y)),
  type = c("additive", "multiplicative"), persistence = NULL,
  transition = NULL, measurement = rep(1, sum(orders)),
  initial = c("optimal", "backcasting"), loss = c("likelihood", "MSE",
  "MAE", "HAM", "MSEh", "TMSE", "GTMSE", "MSCE"), h = 10, holdout = FALSE,
  bounds = c("restricted", "admissible", "none"), silent = c("all",
  "graph", "legend", "output", "none"), ...)
ges(...)

Value

Object of class "adam" is returned with similar elements to the adam function.

Arguments

data

Vector, containing data needed to be forecasted. If a matrix (or data.frame / data.table) is provided, then the first column is used as a response variable, while the rest of the matrix is used as a set of explanatory variables. formula can be used in the latter case in order to define what relation to have.

orders

Order of the model. Specified as vector of number of states with different lags. For example, orders=c(1,1) means that there are two states: one of the first lag type, the second of the second type. In case of auto.gum(), this parameters is the value of the max order to check.

lags

Defines lags for the corresponding orders. If, for example, orders=c(1,1) and lags are defined as lags=c(1,12), then the model will have two states: the first will have lag 1 and the second will have lag 12. The length of lags must correspond to the length of orders. In case of the auto.gum(), the value of the maximum lag to check. This should usually be a maximum frequency of the data.

type

Type of model. Can either be "additive" or "multiplicative". The latter means that the GUM is fitted on log-transformed data. In case of auto.gum(), can also be "select", implying automatic selection of the type.

formula

Formula to use in case of explanatory variables. If NULL, then all the variables are used as is. Can also include trend, which would add the global trend. Only needed if data is a matrix or if trend is provided.

regressors

The variable defines what to do with the provided explanatory variables: "use" means that all of the data should be used, while "select" means that a selection using ic should be done, "adapt" will trigger the mechanism of time varying parameters for the explanatory variables.

initial

Can be either character or a vector of initial states. If it is character, then it can be "optimal", meaning that the initial states are optimised, "backcasting", meaning that the initials are produced using backcasting procedure (still estimating initials for explanatory variables), or "complete", meaning backcasting for all states.

persistence

Persistence vector $g$, containing smoothing parameters. If NULL, then estimated.

transition

Transition matrix $F$. Can be provided as a vector. Matrix will be formed using the default matrix(transition,nc,nc), where nc is the number of components in the state vector. If NULL, then estimated.

measurement

Measurement vector $w$. If NULL, then estimated.

loss

The type of Loss Function used in optimization. loss can be:

likelihood - the model is estimated via the maximisation of the likelihood of the function specified in distribution;
MSE (Mean Squared Error),
MAE (Mean Absolute Error),
HAM (Half Absolute Moment),
LASSO - use LASSO to shrink the parameters of the model;
RIDGE - use RIDGE to shrink the parameters of the model;
TMSE - Trace Mean Squared Error,
GTMSE - Geometric Trace Mean Squared Error,
MSEh - optimisation using only h-steps ahead error,
MSCE - Mean Squared Cumulative Error.

In case of LASSO / RIDGE, the variables are not normalised prior to the estimation, but the parameters are divided by the mean values of explanatory variables.

Note that model selection and combination works properly only for the default loss="likelihood".

Furthermore, just for fun the absolute and half analogues of multistep estimators are available: MAEh, TMAE, GTMAE, MACE, HAMh, THAM, GTHAM, CHAM.

Last but not least, user can provide their own function here as well, making sure that it accepts parameters actual, fitted and B. Here is an example:

lossFunction <- function(actual, fitted, B) return(mean(abs(actual-fitted)))

loss=lossFunction

h

The forecast horizon. Mainly needed for the multistep loss functions.

holdout

Logical. If TRUE, then the holdout of the size h is taken from the data (can be used for the model testing purposes).

bounds

The type of bounds for the parameters to use in the model estimation. Can be either admissible - guaranteeing the stability of the model, or none - no restrictions (potentially dangerous).

silent

Specifies, whether to provide the progress of the function or not. If TRUE, then the function will print what it does and how much it has already done.

model

A previously estimated GUM model, if provided, the function will not estimate anything and will use all its parameters.

...

Other non-documented parameters. See adam for details. However, there are several unique parameters passed to the optimiser in comparison with adam: 1. algorithm0, which defines what algorithm to use in nloptr for the initial optimisation. By default, this is "NLOPT_LN_BOBYQA". 2. algorithm determines the second optimiser. By default this is "NLOPT_LN_NELDERMEAD". 3. maxeval0 and maxeval, that determine the number of iterations for the two optimisers. By default, maxeval0=1000, maxeval=40*k, where k is the number of estimated parameters. 4. xtol_rel0 and xtol_rel, which are 1e-8 and 1e-6 respectively. There are also ftol_rel0, ftol_rel, ftol_abs0 and ftol_abs, which by default are set to values explained in the nloptr.print.options() function.

ic

The information criterion to use in the model selection.

Author

Ivan Svetunkov, ivan@svetunkov.com

Details

The function estimates the Single Source of Error state space model of the following type:

$$y_{t} = w_t' v_{t-l} + \epsilon_{t}$$

$$v_{t} = F v_{t-l} + g_{t} \epsilon_{t}$$

where $v_{t}$ is the state vector (defined using orders) and $l$ is the vector of lags, $w_t$ is the measurement vector (which includes fixed elements and explanatory variables), $F$ is the transition matrix, $g_t$ is the persistence vector (includes explanatory variables as well if provided), finally, $\epsilon_{t}$ is the error term.

For some more information about the model and its implementation, see the vignette: vignette("gum","smooth")

References

Svetunkov I. (2023) Smooth forecasting with the smooth package in R. arXiv:2301.01790. tools:::Rd_expr_doi("10.48550/arXiv.2301.01790").
Svetunkov I. (2015 - Inf) "smooth" package for R - series of posts about the underlying models and how to use them: https://openforecast.org/category/r-en/smooth/.

Snyder, R. D., 1985. Recursive Estimation of Dynamic Linear Models. Journal of the Royal Statistical Society, Series B (Methodological) 47 (2), 272-276.
Hyndman, R.J., Koehler, A.B., Ord, J.K., and Snyder, R.D. (2008) Forecasting with exponential smoothing: the state space approach, Springer-Verlag. tools:::Rd_expr_doi("10.1007/978-3-540-71918-2").

Examples

Run this code

gum(BJsales, h=8, holdout=TRUE)

ourModel <- gum(rnorm(118,100,3), orders=c(2,1), lags=c(1,4), h=18, holdout=TRUE)

# Redo previous model on a new data and produce prediction interval
gum(rnorm(118,100,3), model=ourModel, h=18)

# Produce something crazy with optimal initials (not recommended)
gum(rnorm(118,100,3), orders=c(1,1,1), lags=c(1,3,5), h=18, holdout=TRUE, initial="o")

# Simpler model estimated using trace forecast error loss function and its analytical analogue
gum(rnorm(118,100,3), orders=c(1), lags=c(1), h=18, holdout=TRUE, bounds="n", loss="TMSE")


x <- rnorm(50,100,3)

# The best GUM model for the data
ourModel <- auto.gum(x, orders=2, lags=4, h=18, holdout=TRUE)

summary(ourModel)

Run the code above in your browser using DataLab