Function constructs Generalised Univariate Model, estimating matrices F, w, vector g and initial parameters.
gum(data, orders = c(1, 1), lags = c(1, frequency(data)),
type = c("additive", "multiplicative"), formula = NULL,
regressors = c("use", "select", "adapt", "integrate"),
initial = c("backcasting", "optimal", "complete"), persistence = NULL,
transition = NULL, measurement = rep(1, sum(orders)),
loss = c("likelihood", "MSE", "MAE", "HAM", "MSEh", "TMSE", "GTMSE",
"MSCE"), h = 0, holdout = FALSE, bounds = c("admissible", "none"),
silent = TRUE, model = NULL, ...)auto.gum(data, orders = 3, lags = frequency(data), type = c("additive",
"multiplicative", "select"), formula = NULL, regressors = c("use",
"select", "adapt", "integrate"), initial = c("backcasting", "optimal",
"complete"), ic = c("AICc", "AIC", "BIC", "BICc"), loss = c("likelihood",
"MSE", "MAE", "HAM", "MSEh", "TMSE", "GTMSE", "MSCE"), h = 0,
holdout = FALSE, bounds = c("admissible", "none"), silent = TRUE, ...)
gum_old(data, orders = c(1, 1), lags = c(1, frequency(y)),
type = c("additive", "multiplicative"), persistence = NULL,
transition = NULL, measurement = rep(1, sum(orders)),
initial = c("optimal", "backcasting"), loss = c("likelihood", "MSE",
"MAE", "HAM", "MSEh", "TMSE", "GTMSE", "MSCE"), h = 10, holdout = FALSE,
bounds = c("restricted", "admissible", "none"), silent = c("all",
"graph", "legend", "output", "none"), ...)
ges(...)
Object of class "adam" is returned with similar elements to the adam function.
Vector, containing data needed to be forecasted. If a matrix (or
data.frame / data.table) is provided, then the first column is used as a
response variable, while the rest of the matrix is used as a set of explanatory
variables. formula
can be used in the latter case in order to define what
relation to have.
Order of the model. Specified as vector of number of states
with different lags. For example, orders=c(1,1)
means that there are
two states: one of the first lag type, the second of the second type.
In case of auto.gum()
, this parameters is the value of the max order
to check.
Defines lags for the corresponding orders. If, for example,
orders=c(1,1)
and lags are defined as lags=c(1,12)
, then the
model will have two states: the first will have lag 1 and the second will
have lag 12. The length of lags
must correspond to the length of
orders
. In case of the auto.gum()
, the value of the maximum
lag to check. This should usually be a maximum frequency of the data.
Type of model. Can either be "additive"
or
"multiplicative"
. The latter means that the GUM is fitted on
log-transformed data. In case of auto.gum()
, can also be "select"
,
implying automatic selection of the type.
Formula to use in case of explanatory variables. If NULL
,
then all the variables are used as is. Can also include trend
, which would add
the global trend. Only needed if data
is a matrix or if trend
is provided.
The variable defines what to do with the provided explanatory
variables:
"use"
means that all of the data should be used, while
"select"
means that a selection using ic
should be done,
"adapt"
will trigger the mechanism of time varying parameters for the
explanatory variables.
Can be either character or a vector of initial states. If it
is character, then it can be "optimal"
, meaning that the initial
states are optimised, "backcasting"
, meaning that the initials are
produced using backcasting procedure (still estimating initials for explanatory
variables), or "complete"
, meaning backcasting for all states.
Persistence vector \(g\), containing smoothing
parameters. If NULL
, then estimated.
Transition matrix \(F\). Can be provided as a vector.
Matrix will be formed using the default matrix(transition,nc,nc)
,
where nc
is the number of components in the state vector. If
NULL
, then estimated.
Measurement vector \(w\). If NULL
, then
estimated.
The type of Loss Function used in optimization. loss
can
be:
likelihood
- the model is estimated via the maximisation of the
likelihood of the function specified in distribution
;
MSE
(Mean Squared Error),
MAE
(Mean Absolute Error),
HAM
(Half Absolute Moment),
LASSO
- use LASSO to shrink the parameters of the model;
RIDGE
- use RIDGE to shrink the parameters of the model;
TMSE
- Trace Mean Squared Error,
GTMSE
- Geometric Trace Mean Squared Error,
MSEh
- optimisation using only h-steps ahead error,
MSCE
- Mean Squared Cumulative Error.
In case of LASSO / RIDGE, the variables are not normalised prior to the estimation, but the parameters are divided by the mean values of explanatory variables.
Note that model selection and combination works properly only for the default
loss="likelihood"
.
Furthermore, just for fun the absolute and half analogues of multistep estimators
are available: MAEh
, TMAE
, GTMAE
, MACE
,
HAMh
, THAM
, GTHAM
, CHAM
.
Last but not least, user can provide their own function here as well, making sure
that it accepts parameters actual
, fitted
and B
. Here is an
example:
lossFunction <- function(actual, fitted, B) return(mean(abs(actual-fitted)))
loss=lossFunction
The forecast horizon. Mainly needed for the multistep loss functions.
Logical. If TRUE
, then the holdout of the size h
is taken from the data (can be used for the model testing purposes).
The type of bounds for the parameters to use in the model
estimation. Can be either admissible
- guaranteeing the stability of the
model, or none
- no restrictions (potentially dangerous).
Specifies, whether to provide the progress of the function or not.
If TRUE
, then the function will print what it does and how much it has
already done.
A previously estimated GUM model, if provided, the function will not estimate anything and will use all its parameters.
Other non-documented parameters. See adam for
details. However, there are several unique parameters passed to the optimiser
in comparison with adam
:
1. algorithm0
, which defines what algorithm to use in nloptr for the initial
optimisation. By default, this is "NLOPT_LN_BOBYQA".
2. algorithm
determines the second optimiser. By default this is
"NLOPT_LN_NELDERMEAD".
3. maxeval0 and maxeval, that determine the number of iterations for the two
optimisers. By default, maxeval0=1000
, maxeval=40*k
, where
k is the number of estimated parameters.
4. xtol_rel0 and xtol_rel, which are 1e-8 and 1e-6 respectively.
There are also ftol_rel0, ftol_rel, ftol_abs0 and ftol_abs, which by default
are set to values explained in the nloptr.print.options()
function.
The information criterion to use in the model selection.
Ivan Svetunkov, ivan@svetunkov.com
The function estimates the Single Source of Error state space model of the following type:
$$y_{t} = w_t' v_{t-l} + \epsilon_{t}$$
$$v_{t} = F v_{t-l} + g_{t} \epsilon_{t}$$
where \(v_{t}\) is the state vector (defined using orders
) and
\(l\) is the vector of lags
, \(w_t\) is the measurement
vector (which includes fixed elements and explanatory variables),
\(F\) is the transition
matrix, \(g_t\) is the persistence
vector (includes explanatory variables as well if provided), finally,
\(\epsilon_{t}\) is the error term.
For some more information about the model and its implementation, see the
vignette: vignette("gum","smooth")
Svetunkov I. (2023) Smooth forecasting with the smooth package in R. arXiv:2301.01790. tools:::Rd_expr_doi("10.48550/arXiv.2301.01790").
Svetunkov I. (2015 - Inf) "smooth" package for R - series of posts about the underlying models and how to use them: https://openforecast.org/category/r-en/smooth/.
Snyder, R. D., 1985. Recursive Estimation of Dynamic Linear Models. Journal of the Royal Statistical Society, Series B (Methodological) 47 (2), 272-276.
Hyndman, R.J., Koehler, A.B., Ord, J.K., and Snyder, R.D. (2008) Forecasting with exponential smoothing: the state space approach, Springer-Verlag. tools:::Rd_expr_doi("10.1007/978-3-540-71918-2").
adam, es, ces
gum, es,
ces, sim.es, ssarima
gum(BJsales, h=8, holdout=TRUE)
ourModel <- gum(rnorm(118,100,3), orders=c(2,1), lags=c(1,4), h=18, holdout=TRUE)
# Redo previous model on a new data and produce prediction interval
gum(rnorm(118,100,3), model=ourModel, h=18)
# Produce something crazy with optimal initials (not recommended)
gum(rnorm(118,100,3), orders=c(1,1,1), lags=c(1,3,5), h=18, holdout=TRUE, initial="o")
# Simpler model estimated using trace forecast error loss function and its analytical analogue
gum(rnorm(118,100,3), orders=c(1), lags=c(1), h=18, holdout=TRUE, bounds="n", loss="TMSE")
x <- rnorm(50,100,3)
# The best GUM model for the data
ourModel <- auto.gum(x, orders=2, lags=4, h=18, holdout=TRUE)
summary(ourModel)
Run the code above in your browser using DataLab