Learn R Programming

ecm (version 7.2.0)

lmave: Build multiple lm models and average them

Description

Builds k lm models on k partitions of the data and averages their coefficients to get create one model. Each partition excludes k/nrow(data) observations. See links in the References section for further details on this methodology.

Usage

lmave(formula, data, k, method = "boot", seed = 5, weights = NULL, ...)

Value

an lm object

Arguments

formula

The formula to be passed to lm

data

The data to be used

k

The number of models or data partitions desired

method

Whether to split data by folds ("fold"), nested folds ("nestedfold"), or bootstrapping ("boot")

seed

Seed for reproducibility (only needed if method is "boot")

weights

Optional vector of weights to be passed to the fitting process

...

Additional arguments to be passed to the 'lm' function

Details

In some cases--especially in some time series modeling (see ecmave function)--rather than building one model on the entire dataset, it may be preferable to build multiple models on subsets of the data and average them. The lmave function splits the data into k partitions of size (k-1)/k*nrow(data), builds k models, and then averages the coefficients of these models to get a final model. This is similar to averaging multiple tree regression models in algorithms like random forest.

Unlike the 'ecm' functin, this function only works with the 'lm' linear fitter.

References

Jung, Y. & Hu, J. (2016). "A K-fold Averaging Cross-validation Procedure". https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5019184/

Cochrane, C. (2018). "Time Series Nested Cross-Validation". https://towardsdatascience.com/time-series-nested-cross-validation-76adba623eb9

See Also

lm

Examples

Run this code
##Not run

#Build linear models to predict Wilshire 5000 index based on corporate profits, 
#Federal Reserve funds rate, and unemployment rate
data(Wilshire)

#Build one model on the entire dataset
modelall <- lm(Wilshire5000 ~ ., data = Wilshire[-1])

#Build a five fold averaged linear model on the entire dataset
modelave <- lmave('Wilshire5000 ~ .', data = Wilshire[-1], k = 5) 

Run the code above in your browser using DataLab