gum: Generalised Univariate Model

Description

Function constructs Generalised Univariate Model, estimating matrices F, w, vector g and initial parameters.

Usage

gum(y, orders = c(1, 1), lags = c(1, frequency(y)), type = c("additive",
  "multiplicative"), initial = c("backcasting", "optimal", "two-stage",
  "complete"), persistence = NULL, transition = NULL,
  measurement = rep(1, sum(orders)), loss = c("likelihood", "MSE", "MAE",
  "HAM", "MSEh", "TMSE", "GTMSE", "MSCE", "GPL"), h = 0, holdout = FALSE,
  bounds = c("usual", "admissible", "none"), silent = TRUE, model = NULL,
  xreg = NULL, regressors = c("use", "select", "adapt", "integrate"),
  initialX = NULL, ...)
auto.gum(y, orders = 3, lags = frequency(y), type = c("additive",
  "multiplicative", "select"), initial = c("backcasting", "optimal",
  "two-stage", "complete"), ic = c("AICc", "AIC", "BIC", "BICc"),
  loss = c("likelihood", "MSE", "MAE", "HAM", "MSEh", "TMSE", "GTMSE",
  "MSCE", "GPL"), h = 0, holdout = FALSE, bounds = c("usual",
  "admissible", "none"), silent = TRUE, xreg = NULL,
  regressors = c("use", "select", "adapt", "integrate"), ...)

Value

Object of class "adam" is returned with similar elements to the adam function.

Arguments

y

Vector or ts object, containing data needed to be forecasted.

orders

Order of the model. Specified as vector of number of states with different lags. For example, orders=c(1,1) means that there are two states: one of the first lag type, the second of the second type. In case of auto.gum(), this parameters is the value of the max order to check.

lags

Defines lags for the corresponding orders. If, for example, orders=c(1,1) and lags are defined as lags=c(1,12), then the model will have two states: the first will have lag 1 and the second will have lag 12. The length of lags must correspond to the length of orders. In case of the auto.gum(), the value of the maximum lag to check. This should usually be a maximum frequency of the data.

type

Type of model. Can either be "additive" or "multiplicative". The latter means that the GUM is fitted on log-transformed data. In case of auto.gum(), can also be "select", implying automatic selection of the type.

initial

Can be either character or a list, or a vector of initial states. If it is character, then it can be "backcasting", meaning that the initials of dynamic part of the model are produced using backcasting procedure (advised for data with high frequency), or "optimal", meaning that all initial states are optimised, or "two-stage", meaning that optimisation is done after the backcasting, refining the states. In case of backcasting, the parameters of the explanatory variables are optimised. Alternatively, you can set initial="complete" backcasting, which means that all states (including explanatory variables) are initialised via backcasting.

persistence

Persistence vector $g$, containing smoothing parameters. If NULL, then estimated.

transition

Transition matrix $F$. Can be provided as a vector. Matrix will be formed using the default matrix(transition,nc,nc), where nc is the number of components in the state vector. If NULL, then estimated.

measurement

Measurement vector $w$. If NULL, then estimated.

loss

The type of Loss Function used in optimization. loss can be: likelihood (assuming Normal distribution of error term), MSE (Mean Squared Error), MAE (Mean Absolute Error), HAM (Half Absolute Moment), TMSE - Trace Mean Squared Error, GTMSE - Geometric Trace Mean Squared Error, MSEh - optimisation using only h-steps ahead error, MSCE - Mean Squared Cumulative Error. If loss!="MSE", then likelihood and model selection is done based on equivalent MSE. Model selection in this cases becomes not optimal.

There are also available analytical approximations for multistep functions: aMSEh, aTMSE and aGTMSE. These can be useful in cases of small samples.

Finally, just for fun the absolute and half analogues of multistep estimators are available: MAEh, TMAE, GTMAE, MACE, TMAE, HAMh, THAM, GTHAM, CHAM.

h

Length of forecasting horizon.

holdout

If TRUE, holdout sample of size h is taken from the end of the data.

bounds

The type of bounds for the parameters to use in the model estimation. Can be either admissible - guaranteeing the stability of the model, "usual" restrict all the parameters with the (0, 1) region, or none - no restrictions (potentially dangerous).

silent

accepts TRUE and FALSE. If FALSE, the function will print its progress and produce a plot at the end.

model

A previously estimated GUM model, if provided, the function will not estimate anything and will use all its parameters.

xreg

The vector (either numeric or time series) or the matrix (or data.frame) of exogenous variables that should be included in the model. If matrix included than columns should contain variables and rows - observations. Note that xreg should have number of observations equal either to in-sample or to the whole series. If the number of observations in xreg is equal to in-sample, then values for the holdout sample are produced using es function.

regressors

The variable defines what to do with the provided xreg: "use" means that all of the data should be used, while "select" means that a selection using ic should be done.

initialX

The vector of initial parameters for exogenous variables. Ignored if xreg is NULL.

...

Other non-documented parameters. See adam for details. However, there are several unique parameters passed to the optimiser in comparison with adam: 1. algorithm0, which defines what algorithm to use in nloptr for the initial optimisation. By default, this is "NLOPT_LN_BOBYQA". 2. algorithm determines the second optimiser. By default this is "NLOPT_LN_NELDERMEAD". 3. maxeval0 and maxeval, that determine the number of iterations for the two optimisers. By default, maxeval0=maxeval=40*k, where k is the number of estimated parameters. 4. xtol_rel0 and xtol_rel, which are 1e-8 and 1e-6 respectively. There are also ftol_rel0, ftol_rel, ftol_abs0 and ftol_abs, which by default are set to values explained in the nloptr.print.options() function.

ic

The information criterion used in the model selection procedure.

Author

Ivan Svetunkov, ivan@svetunkov.com

Details

The function estimates the Single Source of Error state space model of the following type:

$$y_{t} = w_t' v_{t-l} + \epsilon_{t}$$

$$v_{t} = F v_{t-l} + g_{t} \epsilon_{t}$$

where $v_{t}$ is the state vector (defined using orders) and $l$ is the vector of lags, $w_t$ is the measurement vector (which includes fixed elements and explanatory variables), $F$ is the transition matrix, $g_t$ is the persistence vector (includes explanatory variables as well if provided), finally, $\epsilon_{t}$ is the error term.

For some more information about the model and its implementation, see the vignette: vignette("gum","smooth")

References

Svetunkov I. (2023) Smooth forecasting with the smooth package in R. arXiv:2301.01790. tools:::Rd_expr_doi("10.48550/arXiv.2301.01790").
Svetunkov I. (2015 - Inf) "smooth" package for R - series of posts about the underlying models and how to use them: https://openforecast.org/category/r-en/smooth/.

Svetunkov, I., 2023. Smooth Forecasting with the Smooth Package in R. arXiv. tools:::Rd_expr_doi("10.48550/arXiv.2301.01790")
Snyder, R. D., 1985. Recursive Estimation of Dynamic Linear Models. Journal of the Royal Statistical Society, Series B (Methodological) 47 (2), 272-276.
Hyndman, R.J., Koehler, A.B., Ord, J.K., and Snyder, R.D. (2008) Forecasting with exponential smoothing: the state space approach, Springer-Verlag. tools:::Rd_expr_doi("10.1007/978-3-540-71918-2").

Examples

Run this code

gum(BJsales, h=8, holdout=TRUE)

ourModel <- gum(rnorm(118,100,3), orders=c(2,1), lags=c(1,4), h=18, holdout=TRUE)

# Redo previous model on a new data and produce prediction interval
gum(rnorm(118,100,3), model=ourModel, h=18)

# Produce something crazy with optimal initials (not recommended)
gum(rnorm(118,100,3), orders=c(1,1,1), lags=c(1,3,5), h=18, holdout=TRUE, initial="o")

# Simpler model estimated using trace forecast error loss function and its analytical analogue
gum(rnorm(118,100,3), orders=c(1), lags=c(1), h=18, holdout=TRUE, bounds="n", loss="TMSE")


x <- rnorm(50,100,3)

# The best GUM model for the data
ourModel <- auto.gum(x, orders=2, lags=4, h=18, holdout=TRUE)

summary(ourModel)

Run the code above in your browser using DataLab