Learn R Programming

imputeTS: Time Series Missing Value Imputation

The imputeTS package specializes on (univariate) time series imputation. It offers several different imputation algorithm implementations. Beyond the imputation algorithms the package also provides plotting and printing functions of time series missing data statistics. Additionally three time series datasets for imputation experiments are included.

Installation

The imputeTS package can be found on CRAN. For installation execute in R:

 install.packages("imputeTS")

If you want to install the latest version from GitHub (can be unstable) run:

library(devtools)
install_github("SteffenMoritz/imputeTS")

Usage

  • Imputation

    To impute (fill all missing values) in a time series x, run the following command:

     na_interpolation(x)

    Output is the time series x with all NA's replaced by reasonable values.

    This is just one example for an imputation algorithm. In this case interpolation was the algorithm of choice for calculating the NA replacements. There are several other algorithms (see also under caption "Imputation Algorithms"). All imputation functions are named alike starting with na_ followed by a algorithm label e.g. na_mean, na_kalman, ...

  • Plotting

    To plot missing data statistics for a time series x, run the following command:

     ggplot_na_distribution(x)

     

This is also just one example for a plot. Overall there are four different types of missing data plots. (see also under caption "Missing Data Plots").

  • Printing

    To print statistics about the missing data in a time series x, run the following command:

     statsNA(x)
  • Datasets

    To load the 'heating' time series (with missing values) into a variable y and the 'heating' time series (without missing values) into a variable z, run:

     y <- tsHeating
     z <- tsHeatingComplete

    There are three datasets provided with the package, the 'tsHeating', the 'tsAirgap' and the 'tsNH4' time series. (see also under caption "Datasets").

Imputation Algorithms

Here is a table with available algorithms to choose from:

FunctionDescription
na_interpolationMissing Value Imputation by Interpolation
na_kalmanMissing Value Imputation by Kalman Smoothing
na_locfMissing Value Imputation by Last Observation Carried Forward
na_maMissing Value Imputation by Weighted Moving Average
na_meanMissing Value Imputation by Mean Value
na_randomMissing Value Imputation by Random Sample
na_removeRemove Missing Values
na_replaceReplace Missing Values by a Defined Value
na_seadecSeasonally Decomposed Missing Value Imputation
na_seasplitSeasonally Splitted Missing Value Imputation

This is a rather broad overview. The functions itself mostly offer more than just one algorithm. For example na_interpolation can be set to linear or spline interpolation.

More detailed information about the algorithms and their options can be found in the imputeTS reference manual.

Missing Data Plots

Here is a table with available plots to choose from:

FunctionDescription
ggplot_na_distributionVisualize Distribution of Missing Values
ggplot_na_distribution2Missing Values Summarized in Intervals
ggplot_na_gapsizeVisualize Distribution of NA Gapsizes
ggplot_na_imputationsVisualize Imputed Values

More detailed information about the plots can be found in the imputeTS reference manual.

Datasets

There are three datasets (each in two versions) available:

DatasetDescription
tsAirgapTime series of monthly airline passengers (with NAs)
tsAirgapCompleteTime series of monthly airline passengers (complete)
tsHeatingTime series of a heating systems supply temperature (with NAs)
tsHeatingCompleteTime series of a heating systems supply temperature (complete)
tsNH4Time series of NH4 concentration in a wastewater system (with NAs)
tsNH4CompleteTime series of NH4 concentration in a wastewater system (complete)

The tsAirgap, tsHeating and tsNH4 time series are with NAs. Their complete versions are without NAs. Except the missing values their versions are identical. The NAs for the time series were artifically inserted by simulating the missing data pattern observed in similar non-complete time series from the same domain. Having a complete and incomplete version of the same dataset is useful for conducting experiments of imputation functions.

More detailed information about the datasets can be found in the imputeTS reference manual.

Reference

You can cite imputeTS the following:

Moritz, Steffen, and Bartz-Beielstein, Thomas. "imputeTS: Time Series Missing Value Imputation in R." R Journal 9.1 (2017). doi: 10.32614/RJ-2017-009.

Need Help?

If you have general programming problems or need help using the package please ask your question on StackOverflow. By doing so all users will be able to benefit in the future from your question.

Don't forget to mark your question with the imputets tag on StackOverflow to get me notified

Support

If you found a bug or have suggestions, feel free to get in contact via steffen.moritz10 at gmail.com.

All feedback is welcome

Version

3.3

License

GPL-3

Copy Link

Version

Install

install.packages('imputeTS')

Monthly Downloads

25,022

Version

3.3

License

GPL-3

Issues

Pull Requests

Stars

Forks

Maintainer

Last Published

September 9th, 2022

Functions in imputeTS (3.3)

na.ma

Deprecated use na_ma instead.
ggplot_na_imputations

Visualize Imputed Values
ggplot_na_gapsize

Visualize Occurrences of NA gap sizes
na.kalman

Deprecated use na_kalman instead.
na_interpolation

Missing Value Imputation by Interpolation
na.random

Deprecated use na_random instead.
na.seasplit

Deprecated use na_seasplit instead.
na_kalman

Missing Value Imputation by Kalman Smoothing and State Space Models
na.replace

Deprecated use na_replace instead.
na.seadec

Deprecated use na_seadec instead.
na.mean

Deprecated use na_mean instead.
na_ma

Missing Value Imputation by Weighted Moving Average
na_locf

Missing Value Imputation by Last Observation Carried Forward
na.remove

Deprecated use na_remove instead.
na_seadec

Seasonally Decomposed Missing Value Imputation
na_seasplit

Seasonally Splitted Missing Value Imputation
na_random

Missing Value Imputation by Random Sample
na_mean

Missing Value Imputation by Mean Value
na_replace

Replace Missing Values by a Defined Value
na_remove

Remove Missing Values
plotNA.gapsize

Discontinued - Use ggplot_na_gapsize instead.
plotNA.distribution

Discontinued - Use ggplot_na_distribution instead.
plotNA.imputations

Discontinued - Use ggplot_na_imputations instead.
plotNA.distributionBar

Discontinued - Use ggplot_na_distribution2 instead.
tsNH4Complete

Time series of NH4 concentration in a wastewater system (complete)
tsNH4

Time series of NH4 concentration in a wastewater system (with NAs)
tsAirgap

Time series of monthly airline passengers (with NAs)
tsHeating

Time series of a heating systems supply temperature (with NAs)
tsHeatingComplete

Time series of a heating systems supply temperature (complete)
statsNA

Print Statistics about Missing Values
reexports

Objects exported from other packages
tsAirgapComplete

Time series of monthly airline passengers (complete)
imputeTS-package

imputeTS: Time Series Missing Value Imputation
ggplot_na_distribution2

Stacked Barplot to Visualize Missing Values per Interval
ggplot_na_distribution

Lineplot to Visualize the Distribution of Missing Values
na.interpolation

Deprecated use na_interpolation instead.
na.locf

Deprecated use na_locf instead.
ggplot_na_intervals

Discontinued - Use ggplot_na_distribution2 instead.