na_ma: Missing Value Imputation by Weighted Moving Average

Description

Missing value replacement by weighted moving average. Uses semi-adaptive window size to ensure all NAs are replaced.

Usage

na_ma(x, k = 4, weighting = "exponential", maxgap = Inf)

Value

Vector (vector) or Time Series (ts) object (dependent on given input at parameter x)

Arguments

x

Numeric Vector (vector) or Time Series (ts) object in which missing values shall be replaced

k

integer width of the moving average window. Expands to both sides of the center element e.g. k=2 means 4 observations (2 left, 2 right) are taken into account. If all observations in the current window are NA, the window size is automatically increased until there are at least 2 non-NA values present.

weighting

Weighting to be used. Accepts the following input:

"simple" - Simple Moving Average (SMA)
"linear" - Linear Weighted Moving Average (LWMA)
"exponential" - Exponential Weighted Moving Average (EWMA) (default choice)

maxgap

Maximum number of successive NAs to still perform imputation on. Default setting is to replace all NAs without restrictions. With this option set, consecutive NAs runs, that are longer than 'maxgap' will be left NA. This option mostly makes sense if you want to treat long runs of NA afterwards separately.

Author

Steffen Moritz

Details

In this function missing values get replaced by moving average values. Moving Averages are also sometimes referred to as "moving mean", "rolling mean", "rolling average" or "running average".

The mean in this implementation taken from an equal number of observations on either side of a central value. This means for an NA value at position i of a time series, the observations i-1,i+1 and i+1, i+2 (assuming a window size of k=2) are used to calculate the mean.

Since it can in case of long NA gaps also occur, that all values next to the central value are also NA, the algorithm has a semi-adaptive window size. Whenever there are less than 2 non-NA values in the complete window available, the window size is incrementally increased, till at least 2 non-NA values are there. In all other cases the algorithm sticks to the pre-set window size.

There are options for using Simple Moving Average (SMA), Linear Weighted Moving Average (LWMA) and Exponential Weighted Moving Average (EWMA).

SMA: all observations in the window are equally weighted for calculating the mean.

LWMA: weights decrease in arithmetical progression. The observations directly next to a central value i, have weight 1/2, the observations one further away (i-2,i+2) have weight 1/3, the next (i-3,i+3) have weight 1/4, ...

EWMA: uses weighting factors which decrease exponentially. The observations directly next to a central value i, have weight 1/2^1, the observations one further away (i-2,i+2) have weight 1/2^2, the next (i-3,i+3) have weight 1/2^3, ...

Examples

Run this code

# Example 1: Perform imputation with simple moving average
na_ma(tsAirgap, weighting = "simple")

# Example 2: Perform imputation with exponential weighted moving average
na_ma(tsAirgap)

# Example 3: Perform imputation with exponential weighted moving average, window size 6
na_ma(tsAirgap, k = 6)

# Example 4: Same as example 1, just written with pipe operator
tsAirgap %>% na_ma(weighting = "simple")

Run the code above in your browser using DataLab