fittestPolyR: Automatic fitting and prediction of polynomial regression

Description

The function predicts and returns the next n consecutive values of a univariate time series using the best evaluated automatically fitted polynomial regression model. It also evaluates the fitness of the produced model, using AICc, AIC, BIC and logLik criteria, and its prediction accuracy, using the MSE, NMSE, MAPE, sMAPE and maximal error accuracy measures.

Usage

fittestPolyR(
  timeseries,
  timeseries.test = NULL,
  h = NULL,
  order = NULL,
  minorder = 0,
  maxorder = 5,
  raw = FALSE,
  na.action = stats::na.omit,
  level = 0.95,
  rank.by = c("MSE", "NMSE", "MAPE", "sMAPE", "MaxError", "AIC", "AICc", "BIC",
    "logLik", "errors", "fitness")
)

Value

A list with components:

model: An object of class "stats::lm" containing the best evaluated polynomial regression model.
order: The order argument provided (or automatically selected) for the best evaluated polynomial regression model.
AICc: Numeric value of the computed AICc criterion of the best evaluated model.
AIC: Numeric value of the computed AIC criterion of the best evaluated model.
BIC: Numeric value of the computed BIC criterion of the best evaluated model.
logLik: Numeric value of the computed log-likelihood of the best evaluated model.
pred: A list with the components mean, lower and upper, containing the predictions of the best evaluated model and the lower and upper limits for prediction intervals, respectively. All components are time series. See predict.lm.
MSE: Numeric value of the resulting MSE error of prediction. Require timeseries.test.
NMSE: Numeric value of the resulting NMSE error of prediction. Require timeseries.test.
MAPE: Numeric value of the resulting MAPE error of prediction. Require timeseries.test.
sMAPE: Numeric value of the resulting sMAPE error of prediction. Require timeseries.test.
MaxError: Numeric value of the maximal error of prediction. Require timeseries.test.
rank.val: Data.frame with the coefficients and the fitness or prediction accuracy criteria computed for all candidate polynomial regression models ranked by rank.by. It has the attribute "model.calls", which is a list of objects of class "expression" containing the calls of all the candidate polynomial regression models, also ranked by rank.by.
rank.by: Ranking criteria used for ranking candidate models and producing rank.val.

Arguments

timeseries: A vector or univariate time series which contains the values used for fitting a polynomial regression model.
timeseries.test: A vector or univariate time series containing a continuation for timeseries with actual values. It is used as a testing set and base for calculation of prediction error measures. Ignored if NULL.
h: Number of consecutive values of the time series to be predicted. If h is NULL, the number of consecutive values to be predicted is assumed to be equal to the length of timeseries.test. Required when timeseries.test is NULL.
order: A numeric integer value corresponding to the order of polynomial regression to be fitted. If NULL, the order of the polynomial regression returned by the function is automatically selected within the interval minorder:maxorder. See 'Details'.
minorder: A numeric integer value corresponding to the minimum order of candidate polynomial regression to be fitted and evaluated. Ignored if order is provided. See 'Details'.
maxorder: A numeric integer value corresponding to the maximal order of candidate polynomial regression to be fitted and evaluated. Ignored if order is provided. See 'Details'.
raw: If TRUE, use raw and not orthogonal polynomials. Orthogonal polynomials help avoid correlation between variables. Default is FALSE. See poly of the stats package.
na.action: A function for treating missing values in timeseries and timeseries.test. The default function is na.omit, which omits any missing values found in timeseries or timeseries.test.
level: Confidence level for prediction intervals. See the predict.lm function in the stats package.
rank.by: Character string. Criteria used for ranking candidate models generated. See 'Details'.

Author

Rebecca Pontes Salles

Details

A set with candidate polynomial regression models of order order is generated with help from the dredge function from the MuMIn package. The candidate models are ranked acoording to the criteria in rank.by and the best ranked model is returned by the function.

If order is NULL, it is automatically selected. For that, the candidate polynomial regression models generated receive orders from minorder to maxorder. The value option of order which generate the best ranked candidate polynomial regression model acoording to the criteria in rank.by is selected.

The ranking criteria in rank.by may be set as a prediction error measure (such as MSE, NMSE, MAPE, sMAPE or MAXError), or as a fitness criteria (such as AIC, AICc, BIC or logLik). In the former case, the candidate models are used for time series prediction and the error measures are calculated by means of a cross-validation process. In the latter case, the candidate models are fitted and fitness criteria are calculated based on all observations in timeseries.

If rank.by is set as "errors" or "fitness", the candidate models are ranked by all the mentioned prediction error measures or fitness criteria, respectively. The wheight of the ranking criteria is equally distributed. In this case, a rank.position.sum criterion is produced for ranking the candidate models. The rank.position.sum criterion is calculated as the sum of the rank positions of a model (1 = 1st position = better ranked model, 2 = 2nd position, etc.) on each calculated ranking criteria.

References

R.J. Hyndman and G. Athanasopoulos, 2013, Forecasting: principles and practice. OTexts.

R.H. Shumway and D.S. Stoffer, 2010, Time Series Analysis and Its Applications: With R Examples. 3rd ed. 2011 edition ed. New York, Springer.

Examples

Run this code


data(CATS,CATS.cont)
fPolyR <- fittestPolyR(CATS[,3],CATS.cont[,3])
#predicted values
pred <- fPolyR$pred

#plotting the time series data
plot(c(CATS[,3],CATS.cont[,3]),type='o',lwd=2,xlim=c(960,1000),ylim=c(-100,300),
xlab="Time",ylab="PR")
#plotting predicted values
lines(ts(pred$mean,start=981),lwd=2,col='blue')
#plotting prediction intervals
lines(ts(pred$lower,start=981),lwd=2,col='light blue')
lines(ts(pred$upper,start=981),lwd=2,col='light blue')

Run the code above in your browser using DataLab