ecmave: Build an averaged error correction model

Description

Builds multiple ECM models on subsets of the data and averages them. See the lmave function for more details on the methodology and use cases for this approach.

Usage

ecmave(
  y,
  xeq,
  xtr,
  includeIntercept = TRUE,
  k,
  method = "boot",
  seed = 5,
  weights = NULL,
  ...
)

Value

an lm object representing an error correction model

Arguments

y: The target variable
xeq: The variables to be used in the equilibrium term of the error correction model
xtr: The variables to be used in the transient term of the error correction model
includeIntercept: Boolean whether the y-intercept should be included
k: The number of models or data partitions desired
method: Whether to split data by folds ("fold"), nested folds ("nestedfold"), or bootstrapping ("boot")
seed: Seed for reproducibility (only needed if method is "boot")
weights: Optional vector of weights to be passed to the fitting process
...: Additional arguments to be passed to the 'lm' function (careful in that these may need to be modified for ecm or may not be appropriate!)

Details

In some cases, instead of building an ECM on the entire dataset, it may be preferable to build k ECM models on k subsets of the data, each subset containing (k-1)/k*nrow(data) observations of the full dataset, and then average their coefficients. Reasons to do this include controlling for overfitting or extending the training sample. For example, in many time series modeling exercises, the holdout test sample is often the latest few months or years worth of data. Ideally, it's desirable to include these data since they likely have more future predictive power than older observations. However, including the entire dataset in the training sample could result in overfitting, or using a different time period as the test sample may be even less representative of future performance. One potential solution is to build multiple ECM models using the entire dataset, each with a different holdout test sample, and then average them to get a final ECM. This approach is somewhat similar to the idea of random forest regression, in which multiple regression trees are built on subsets of the data and then averaged.

This function only works with the 'lm' linear fitter.

Examples

Run this code

##Not run

#Use ecm to predict Wilshire 5000 index based on corporate profits, 
#Federal Reserve funds rate, and unemployment rate
data(Wilshire)

#Use 2015-12-01 and earlier data to build models
trn <- Wilshire[Wilshire$date<='2015-12-01',]

#Build five ECM models and average them to get one model
xeq <- xtr <- trn[c('CorpProfits', 'FedFundsRate', 'UnempRate')]
model1 <- ecmave(trn$Wilshire5000, xeq, xtr, includeIntercept=TRUE, k=5)