A simple backtest of Semi-ARMA models via rolling forecasts can be implemented.
rollCast(
y,
p = NULL,
q = NULL,
K = 5,
method = c("norm", "boot"),
alpha = 0.95,
np.fcast = c("lin", "const"),
it = 10000,
n.start = 1000,
pb = TRUE,
cores = future::availableCores(),
argsSmoots = list(),
plot = TRUE,
argsPlot = list()
)
A list with different elements is returned. The elements are as follows.
a single numeric value; it describes, what confidence level
(\(100\)alpha
)-percent has been considered for the forecasting
intervals.
a logical vector that states whether the \(K\) true
out-of-sample observations lie outside of the forecasting intervals,
respectively; a breach is denoted by TRUE
.
a numeric vector that contains the margin of the breaches (in absolute terms) for the \(K\) out-of-sample time points; if a breach did not occur, the respective element is set to zero.
a numeric vector that contains the simulated empirical
values of the forecasting error for method = "boot"
; otherwise,
it is set to NULL
.
a numeric vector that contains the \(K\) point forecasts of the parametric part of the model.
a numeric matrix that contains the \(K\) rolling point forecasts as well as the values of the bounds of the respective forecasting intervals for the complete model; the first row contains the point forecasts, the lower bounds of the forecasting intervals are in the second row and the upper bounds can be found in the third row.
a numeric vector that contains the \(K\) obtained trend forecasts.
a positive integer; states the number of out-of-sample observations as well as the number of forecasts for the out-of-sample time points.
the obtained value of the mean average scaled error for the selected model.
a character object that states, whether the forecasting
intervals were obtained via a bootstrap (method = "boot"
) or under
the normality assumption for the innovations (method = "norm"
).
the output (usually a list) of the nonparametric
trend estimation via msmooth
.
the output (usually a list) of the parametric ARMA
estimation of the detrended series via arima
.
the number of observations (in-sample & out-of-sample observations).
the number of in-sample observations (n - n.out
).
the number of out-of-sample observations (equals K
).
a character object that states the applied forecasting
method for the nonparametric trend function; either a linear (
np.fcast = "lin"
) or a constant np.fcast = "const"
are
possible.
a numeric vector of length 2 with the
\([100(1 -\) alpha
\()/2]\)-percent and
{\(100\)\([1 - (1 -\) alpha
\()/2]\)}-percent quantiles of
the forecasting error distribution.
the obtained value of the root mean squared scaled error for the selected model.
a numeric vector that contains all true observations (in-sample & out-of-sample observations).
a numeric vector that contains all in-sample observations.
a numeric vector that contains the \(K\) out-of-sample observations.
a numeric vector that represents the equidistant time series assumed to follow a Semi-ARMA model; must be ordered from past to present.
an integer value \(\geq 0\) that defines the AR order
\(p\) of the underlying ARMA(\(p,q\)) model within X
; is set to
NULL
by default; if no value is passed to p
but one is passed
to q
, p
is set to 0
; if both p
and q
are
NULL
, optimal orders following the BIC for
\(0 \leq p,q \leq 5\) are chosen; is set to NULL
by
default; decimal numbers will be rounded off to integers.
an integer value \(\geq 0\) that defines the MA order
\(q\) of the underlying ARMA(\(p,q\)) model within X
; is set to
NULL
by default; if no value is passed to q
but one is passed
to p
, q
is set to 0
; if both p
and q
are
NULL
, optimal orders following the BIC for
\(0 \leq p,q \leq 5\) are chosen; is set to NULL
by
default; decimal numbers will be rounded off to integers.
a single, positive integer value that defines the number of
out-of-sample observations; the last K
observations in y
are
treated as the out-of-sample observations, whereas the rest of the
observations in y
are the in-sample values.
a character object; defines the method used for the calculation
of the forecasting intervals; with "norm"
the intervals are obtained
under the assumption of normally distributed innovations; with "boot"
the intervals are obtained via a bootstrap; is set to "norm"
by
default.
a numeric vector of length 1 with \(0 < \) alpha
\( < 1\); the forecasting intervals will be obtained based on the
confidence level (\(100\)alpha
)-percent; is set to
alpha = 0.95
by default, i.e., a \(95\)-percent confidence level.
a character object; defines the forecasting method used
for the nonparametric trend; for np.fcast = "lin"
the trend is
is extrapolated linearly based on the last two trend estimates; for
np.fcast = "const"
, the last trend estimate is used as a constant
estimate for future values; is set to "lin" by default.
an integer that represents the total number of iterations, i.e.,
the number of simulated series; is set to 10000
by default; only
necessary, if method = "boot"
; decimal
numbers will be rounded off to integers.
an integer that defines the 'burn-in' number
of observations for the simulated ARMA series via bootstrap; is set to
1000
by default; only necessary, if method = "boot"
;decimal
numbers will be rounded off to integers.
a logical value; for pb = TRUE
, a progress bar will be shown
in the console, if method = "boot"
.
an integer value >0 that states the number of (logical) cores to
use in the bootstrap (or NULL
); the default is the maximum number of
available cores
(via future::availableCores
); for
cores = NULL
, parallel computation is disabled.
a list that contains arguments that will be passed to
msmooth
for the estimation of the nonparametric trend
function; by default, the default values of msmooth
are used.
a logical value that controls the graphical output; for the
default (plot = TRUE
), the original series with the obtained point
forecasts as well as the forecasting intervals will be plotted; for
plot = FALSE
, no plot will be created.
a list; additional arguments for the standard plot function,
e.g., xlim
, type
, ..., can be passed to it; arguments with
respect to plotted graphs, e.g., the argument col
, only affect the
original series y
; please note that in accordance with the argument
x
(lower case) of the standard plot function, an additional numeric
vector with time points can be implemented via the argument x
(lower
case).
Yuanhua Feng (Department of Economics, Paderborn University),
Author of the Algorithms
Website: https://wiwi.uni-paderborn.de/en/dep4/feng/
Dominik Schulz (Research Assistant) (Department of Economics, Paderborn
University),
Package Creator and Maintainer
Define that an observed, equidistant time series \(y_t\), with \(t = 1, 2, ..., n\), follows $$y_t = m(x_t) + \epsilon_t,$$ where \(x_t = t/n\) is the rescaled time on the closed interval \([0,1]\) and \(m(x_t)\) is a nonparametric and deterministic trend function (see Beran and Feng, 2002, and Feng, Gries and Fritz, 2020). \(\epsilon_t\), on the other hand, is a stationary process with \(E(\epsilon_t) = 0\) and short-range dependence. For the purpose of this function, \(\epsilon_t\) is assumed to follow an autoregressive-moving-average (ARMA) model with $$\epsilon_t = \zeta_t + \beta_1 \epsilon_{t-1} + ... + \beta_p \epsilon_{t-p} + \alpha_1 \zeta_{t-1} + ... + \alpha_q \zeta_{t-q}.$$ Here, the random variables \(\zeta_t\) are identically and independently distributed (i.i.d.) with zero-mean and a constant variance and the coefficients \(\alpha_j\) and \(\beta_i\), \(i = 1, 2, ..., p\) and \(j = 1, 2, ..., q\), are real numbers. The combination of both previous formulas will be called a semiparametric ARMA (Semi-ARMA) model.
An explicit forecasting method of Semi-ARMA models is described in
modelCast
. To backtest a selected model, a slightly adjusted
procedure is used. The data is divided into in-sample and an
out-of-sample values (usually the last \(K = 5\) observations in the data
are reserved for the out-of-sample observations). A model is fitted to the
in-sample data, whereas one-step rolling point forecasts and forecasting
intervals are obtained for the out-of-sample time points. The proposed
forecasts of the trend are either a linear or a constant extrapolation of
the trend with negligible forecasting intervals, whereas the point forecasts
of the stationary rest term are obtained via the selected ARMA(\(p,q\))
model (see Fritz et al., 2020). The corresponding forecasting intervals
are calculated under the assumption that the innovations
\(\zeta_t\) are either normally distributed (see e.g. pp.
93-94 in Brockwell and Davis, 2016) or via a forward bootstrap (see Lu and
Wang, 2020). For a one-step forecast for time point \(t\), all observations
until time point \(t-1\) are assumed to be known.
The function calculates three important values for backtesting: the number of breaches, i.e. the number of true observations that lie outside of the forecasting intervals, the mean absolute scaled error (MASE, see Hyndman and Koehler, 2006) and the root mean squared scaled error (RMSSE, see Hyndman and Koehler, 2006) are obtained. For the MASE, a value \(< 1\) indicates a better average forecasting potential than a naive forecasting approach. Furthermore, it is independent from the scale of the data and can thus be used to compare forecasts of different datasets. Closely related is the RMSSE, however here, the mean of the squared forecasting errors is computed and scaled by the mean of the squared naive forecasting approach. Then the root of that value is the RMSSE. Due to the close relation, the interpretation of the RMSSE is similarly but not identically to the interpretation of the MASE. Of course, a value close to zero is preferred in both cases.
To make use of the function, a numeric vector with the values of a time
series that is assumed to follow a Semi-ARMA model needs to be passed to
the argument y
. Moreover, the arguments p
and q
represent the AR and MA orders, respectively, of the underlying ARMA
process in the parametric part of the model. If both values are set to
NULL
, an optimal order in accordance with the Bayesian Information
Criterion (BIC) will be selected. If only one of the values is NULL
,
it will be changed to zero instead. K
defines the number of the
out-of-sample observations; these will be cut off the end of y
, while
the remaining observations are treated as the in-sample observations. For the
\(K\) out-of-sample time points, rolling forecasts will be obtained.
method
describes the method to use for the computation of the
prediction intervals. Under the normality assumption for the innovations
\(\zeta_t\), intervals can be obtained via
method = "norm". However, if the assumption does not hold, a
bootstrap can be implemented as well (method = "boot"). Both
approaches are explained in more detail in normCast
and
bootCast
, respectively. With alpha
, the confidence
level of the forecasting intervals can be adjusted, as the
(\(100\)alpha
)-percent forecasting intervals will be computed. By
means of the argument np.fcast
, the forecasting method for the
nonparametric trend function can be defined. Selectable are a linear
(np.fcast = "lin"
) and a constant (np.fcast = "const"
)
extrapolation. For more information on these methods, we refer the reader to
trendCast
.
it
, n.start
, pb
and cores
are only
relevant for method = "boot"
. With it
the total number of
bootstrap iterations is defined, whereas n.start
regulates, how
many 'burn-in' observations are generated for each simulated ARMA process
in the bootstrap. Since a bootstrap may take a longer computation time,
with the argument cores
the number of cores for parallel computation
of the bootstrap iterations can be defined. Nonetheless, for
cores = NULL
, no cluster is created and therefore
the parallel computation is disabled. Note that the bootstrapped results are
fully reproducible for all cluster sizes. Moreover, for pb = TRUE
,
the progress of the bootstrap approach can be observed in the R console via
a progress bar. Additional information on these four function arguments can
be found in bootCast
.
The argument argsSmoots
is a list. In this list, different arguments
of the function msmooth
can be implemented to adjust the
estimation of the nonparametric part of the complete model. The arguments
of the smoothing function are described in msmooth
.
rollCast
allows for a quick plot of the results. If the logical
argument plot
is set to TRUE
, a graphic with default
settings is created. Nevertheless, users are allowed to implement further
arguments of the standard plot function in the list argsPlot
. For
example, the limits of the plot can be adjusted by xlim
and
ylim
. Furthermore, an argument x
can be included in
argsPlot
with the actual equidistant time points of the whole series
(in-sample & out-of-sample observations). Otherwise, simply 1:n
is
used as the in-sample time points by default.
NOTE:
Within this function, the arima
function of the
stats
package with its method "CSS-ML"
is used throughout for
the estimation of ARMA models. Furthermore, to increase the performance,
C++ code via the Rcpp
and
RcppArmadillo
packages
was implemented. Also, the future
and
future.apply
packages are
considered for parallel computation of bootstrap iterations. The progress
of the bootstrap is shown via the
progressr
package.
Beran, J., and Feng, Y. (2002). Local polynomial fitting with long-memory, short-memory and antipersistent errors. Annals of the Institute of Statistical Mathematics, 54, 291-311.
Brockwell, P. J., and Davis, R. A. (2016). Introduction to time series and forecasting, 3rd edition. Springer.
Fritz, M., Forstinger, S., Feng, Y., and Gries, T. (2020). Forecasting economic growth processes for developing economies. Unpublished.
Feng, Y., Gries, T. and Fritz, M. (2020). Data-driven local polynomial for the trend and its derivatives in economic time series. Journal of Nonparametric Statistics, 32:2, 510-533.
Hyndman, R. J., and Koehler, A. B. (2006). Another look at measures of forecast accuracy. International Journal of Forecasting, 22:4, 679-688.
Lu, X., and Wang, L. (2020). Bootstrap prediction interval for ARMA models with unknown orders. REVSTAT–Statistical Journal, 18:3, 375-396.
lgdp <- log(smoots::gdpUS$GDP)
time <- seq(from = 1947.25, to = 2019.5, by = 0.25)
backtest <- rollCast(lgdp, K = 5,
argsPlot = list(x = time, xlim = c(2012, 2019.5), col = "forestgreen",
type = "b", pch = 20, lty = 2, main = "Example"))
backtest
Run the code above in your browser using DataLab