CVar: k-fold Cross-Validation applied to an autoregressive model

Description

CVar computes the errors obtained by applying an autoregressive modelling function to subsets of the time series y using k-fold cross-validation as described in Bergmeir, Hyndman and Koo (2015). It also applies a Ljung-Box test to the residuals. If this test is significant (see returned pvalue), there is serial correlation in the residuals and the model can be considered to be underfitting the data. In this case, the cross-validated errors can underestimate the generalization error and should not be used.

Usage

CVar(
  y,
  k = 10,
  FUN = nnetar,
  cvtrace = FALSE,
  blocked = FALSE,
  LBlags = 24,
  ...
)

Value

A list containing information about the model and accuracy for each fold, plus other summary information computed across folds.

Arguments

y: Univariate time series
k: Number of folds to use for cross-validation.
FUN: Function to fit an autoregressive model. Currently, it only works with the nnetar function.
cvtrace: Provide progress information.
blocked: choose folds randomly or as blocks?
LBlags: lags for the Ljung-Box test, defaults to 24, for yearly series can be set to 20
...: Other arguments are passed to FUN.

Author

Gabriel Caceres and Rob J Hyndman

References

Bergmeir, C., Hyndman, R.J., Koo, B. (2018) A note on the validity of cross-validation for evaluating time series prediction. Computational Statistics & Data Analysis, 120, 70-83. https://robjhyndman.com/publications/cv-time-series/.

Examples

Run this code


modelcv <- CVar(lynx, k=5, lambda=0.15)
print(modelcv)
print(modelcv$fold1)

library(ggplot2)
autoplot(lynx, series="Data") +
  autolayer(modelcv$testfit, series="Fits") +
  autolayer(modelcv$residuals, series="Residuals")
ggAcf(modelcv$residuals)

Run the code above in your browser using DataLab