fittestEMD: Automatic prediction with empirical mode decomposition

Description

The function automatically applies an empirical mode decomposition to a provided univariate time series. The resulting components of the decomposed series are used as base for predicting and returning the next n consecutive values of the provided univariate time series using also automatically fitted models. It also evaluates fitness and prediction accuracy of the produced models.

Usage

fittestEMD(
  timeseries,
  timeseries.test = NULL,
  h = NULL,
  num_imfs = 0,
  S_number = 4L,
  num_siftings = 50L,
  level = 0.95,
  na.action = stats::na.omit,
  model = c("ets", "arima"),
  rank.by = c("MSE", "NMSE", "MAPE", "sMAPE", "MaxError", "errors")
)

Value

A list with components:

emd: Same as emd. Contains the empirical mode decomposition of timeseries.
meaningfulImfs: Character string indicating the automatically selected meaningful IMFs of the decomposition.
pred: A list with the components mean, lower and upper, containing the predictions based on the best evaluated decomposition and the lower and upper limits for prediction intervals, respectively. All components are time series.
MSE: Numeric value of the resulting MSE error of prediction. Require timeseries.test.
NMSE: Numeric value of the resulting NMSE error of prediction. Require timeseries.test.
MAPE: Numeric value of the resulting MAPE error of prediction. Require timeseries.test.
sMAPE: Numeric value of the resulting sMAPE error of prediction. Require timeseries.test.
MaxError: Numeric value of the maximal error of prediction. Require timeseries.test.
rank.val: Data.frame with the fitness or prediction accuracy criteria computed based on all candidate decompositions ranked by rank.by.
rank.by: Ranking criteria used for ranking candidate decompositions and producing rank.val.

Arguments

timeseries: A vector or univariate time series.
timeseries.test: A vector or univariate time series containing a continuation for timeseries with actual values. It is used as a testing set and base for calculation of prediction error measures. Ignored if NULL.
h: Number of consecutive values of the time series to be predicted. If h is NULL, the number of consecutive values to be predicted is assumed to be equal to the length of timeseries.test. Required when timeseries.test is NULL.
num_imfs: Number of Intrinsic Mode Functions (IMFs) to compute. See emd.
S_number, num_siftings: See emd.
level: Confidence level for prediction intervals. See predict.lm and predict.
na.action: A function for treating missing values in timeseries and timeseries.test. The default function is na.omit, which omits any missing values found in timeseries or timeseries.test.
model: Character string. Indicates which model is to be used for fitting and prediction of the components of the decomposed series.
rank.by: Character string. Criteria used for ranking candidate decompositions/models/predictions generated during parameter selection. See 'Details'.

Author

Rebecca Pontes Salles

Details

The function produces an empirical mode decomposition of timeseries. See the emd function. The Intrinsic Mode Functions (IMFs) and residue series resulting from the decomposition are separately used as base for model fitting and prediction. The set of predictions for all IMFs and residue series are then reversed transformed in order to produce the next h consecutive values of the provided univariate time series in timeseries. See the emd.rev function.

The function automatically selects the meaningful IMFs of a decomposition. For that, the function produces models for different selections of meaningful IMFs according to the possible intervals i:num_imfs for i=1,...,(num_imfs-1), where num_imfs is the number of IMFs in a decomposition. The options of meaningful IMFs of a decomposition which generate the best ranked model fitness/predictions acoording to the criteria in rank.by are selected.

The ranking criteria in rank.by may be set as a prediction error measure (such as MSE, NMSE, MAPE, sMAPE or MAXError), or as a fitness criteria (such as AIC, AICc, BIC or logLik). In the former case, the candidate empirical mode decompositions are used for time series prediction and the error measures are calculated by means of a cross-validation process. In the latter case, the component series of the candidate decompositions are modeled and model fitness criteria are calculated based on all observations in timeseries. In particular, the fitness criteria calculated for ranking the candidate decompositions correspond to the models produced for the IMFs.

If rank.by is set as "errors" or "fitness", the candidate decompositions are ranked by all the mentioned prediction error measures or fitness criteria, respectively. The wheight of the ranking criteria is equally distributed. In this case, a rank.position.sum criterion is produced for ranking the candidate decompositions. The rank.position.sum criterion is calculated as the sum of the rank positions of a decomposition (1 = 1st position = better ranked model, 2 = 2nd position, etc.) on each calculated ranking criteria.

References

Kim, D., Paek, S. H., & Oh, H. S. (2008). A Hilbert-Huang transform approach for predicting cyber-attacks. Journal of the Korean Statistical Society, 37(3), 277-283.

Examples

Run this code


data(CATS)
# \donttest{
femd <- fittestEMD(CATS[,1],h=20)
# }

Run the code above in your browser using DataLab