cvts: Cross validation for time series

Description

Perform cross validation on a time series.

Usage

cvts(x, FUN = NULL, FCFUN = NULL, rolling = FALSE, windowSize = 84,
  maxHorizon = 5, horizonAverage = FALSE, xreg = NULL,
  saveModels = ifelse(length(x) > 500, FALSE, TRUE),
  saveForecasts = ifelse(length(x) > 500, FALSE, TRUE), verbose = TRUE,
  num.cores = 2L, extraPackages = NULL, ...)

Arguments

the input time series.

FUN

the model function used. Custom functions are allowed. See details and examples.

FCFUN

a function that process point forecasts for the model function. This defaults to forecast. Custom functions are allowed. See details and examples. See details.

rolling

should a rolling procedure be used? If TRUE, non-overlapping windows of size maxHorizon will be used for fitting each model. If FALSE, the size of the dataset used for training will grow by one each iteration.

windowSize

length of the window to build each model. When rolling == FALSE, the each model will be fit to a time series of this length, and when rolling == TRUE the first model will be fit to a series of this length and grow by one each iteration.

maxHorizon

maximum length of the forecast horizon to use for computing errors.

horizonAverage

should the final errors be an average over all forecast horizons up to maxHorizon instead of producing metrics for each individual horizon?

xreg

External regressors to be used to fit the model. Only used if FUN accepts xreg as an argument. FCFUN is also expected to accept it (see details)

saveModels

should the individual models be saved? Set this to FALSE on long time series to save memory.

saveForecasts

should the individual forecast from each model be saved? Set this to FALSE on long time series to save memory.

verbose

should the current progress be printed to the console?

num.cores

the number of cores to use for parallel fitting. If the underlying model that is being fit also utilizes parallelization, the number of cores it is using multiplied by `num.cores` should not exceed the number of cores available on your machine.

extraPackages

on Windows if a custom `FUN` or `FCFUN` is being used that requires packages to be loaded, these can be passed here so that they can be passed to parallel socket workers

...

Other arguments to be passed to the model function FUN

Details

Cross validation of time series data is more complicated than regular k-folds or leave-one-out cross validation of datasets without serial correlation since observations $x_t$ and $x_{t+n}$ are not independent. The cvts() function overcomes this obstacle using two methods: 1) rolling cross validation where an initial training window is used along with a forecast horizon and the initial window used for training grows by one observation each round until the training window and the forecast horizon capture the entire series or 2) a non-rolling approach where a fixed training length is used that is shifted forward by the forecast horizon after each iteration.

For the rolling approach, training points are heavily recycled, both in terms of used for fitting and in generating forecast errors at each of the forecast horizons from 1:maxHorizon. In contrast, the models fit with the non-rolling approach share less overlap, and the predicted forecast values are also only compared to the actual values once. The former approach is similar to leave-one-out cross validation while the latter resembles k-fold cross validation. As a result, rolling cross validation requires far more iterations and computationally takes longer to complete, but a disadvantage of the non-rolling approach is the greater variance and general instability of cross-validated errors.

The FUN and FCFUN arguments specify which function to use for generating a model and forecasting, respectively. While the functions from the "forecast" package can be used, user-defined functions can also be tested, but the object returned by FCFUN must accept the argument h and contain the point forecasts out to this horizon h in slot $mean of the returned object. An example is given with a custom model and forecast.

For small time series (default length <= 500), all of the individual fit models are included in the final cvts object that is returned. This can grow quite large since functions such as auto.arima will save fitted values, residual values, summary statistics, coefficient matrices, etc. Setting saveModels = FALSE can be safely done if there is no need to examine individual models fit at every stage of cross validation since the forecasts from each fold and the associated residuals are always saved.

External regressors are allowed via the xreg argument. It is assumed that both FUN and FCFUN accept the xreg parameter if xreg is not NULL. If FUN does not accept the xreg parameter a warning will be given. No warning is provided if FCFUN does not use the xreg parameter.

Examples

Run this code

# NOT RUN {
series <- subset(AirPassengers, end = 50)
cvmod1 <- cvts(series, FUN = snaive,
               windowSize = 25, maxHorizon = 12)
accuracy(cvmod1)

# We can also use custom model functions for modeling/forecasting
stlmClean <- function(x){stlm(tsclean(x))}
series <- subset(austres, end = 38)
cvmodCustom <- cvts(series, FUN = stlmClean, windowSize = 26, maxHorizon = 6)
accuracy(cvmodCustom)

# Use the rwf() function from the "forecast" package.
# This function does not have a modeling function and
# instead calculates a forecast on the time series directly
series <- subset(AirPassengers, end = 26)
rwcv <- cvts(series, FCFUN = rwf, windowSize = 24, maxHorizon = 1)

# }
# NOT RUN {
cvmod2 <- cvts(USAccDeaths, FUN = ets,
               saveModels = FALSE, saveForecasts = FALSE,
               windowSize = 36, maxHorizon = 12)

# If we don't need prediction intervals and are using the nnetar model, turning off PI
# will make the forecasting much faster
cvmod3 <- cvts(AirPassengers, FUN = hybridModel,
               FCFUN = function(mod, h) forecast(mod, h = h, PI=FALSE),
               rolling = FALSE, windowSize = 48,
               maxHorizon = 12)
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab

Description

Usage

Arguments

Details

See Also

Examples