Learn R Programming

timetk (version 2.8.1)

step_ts_impute: Missing Data Imputation for Time Series

Description

step_ts_impute creates a specification of a recipe step that will impute time series data.

Usage

step_ts_impute(
  recipe,
  ...,
  period = 1,
  lambda = NULL,
  role = NA,
  trained = FALSE,
  lambdas_trained = NULL,
  skip = FALSE,
  id = rand_id("ts_impute")
)

# S3 method for step_ts_impute tidy(x, ...)

Value

An updated version of recipe with the new step added to the sequence of existing steps (if any). For the tidy method, a tibble with columns terms (the selectors or variables selected) and value (the lambda estimate).

Arguments

recipe

A recipe object. The step will be added to the sequence of operations for this recipe.

...

One or more selector functions to choose which variables are affected by the step. See selections() for more details. For the tidy method, these are not currently used.

period

A seasonal period to use during the transformation. If period = 1, linear interpolation is performed. If period > 1, a robust STL decomposition is first performed and a linear interpolation is applied to the seasonally adjusted data.

lambda

A box cox transformation parameter. If set to "auto", performs automated lambda selection.

role

Not used by this step since no new variables are created.

trained

A logical to indicate if the quantities for preprocessing have been estimated.

lambdas_trained

A named numeric vector of lambdas. This is NULL until computed by recipes::prep(). Note that, if the original data are integers, the mean will be converted to an integer to maintain the same a data type.

skip

A logical. Should the step be skipped when the recipe is baked by bake.recipe()? While all operations are baked when prep.recipe() is run, some operations may not be able to be conducted on new data (e.g. processing the outcome variable(s)). Care should be taken when using skip = TRUE as it may affect the computations for subsequent operations.

id

A character string that is unique to this step to identify it.

x

A step_ts_impute object.

Details

The step_ts_impute() function is designed specifically to handle time series

Imputation using Linear Interpolation

Three circumstances cause strictly linear interpolation:

  1. Period is 1: With period = 1, a seasonality cannot be interpreted and therefore linear is used.

  2. Number of Non-Missing Values is less than 2-Periods: Insufficient values exist to detect seasonality.

  3. Number of Total Values is less than 3-Periods: Insufficient values exist to detect seasonality.

Seasonal Imputation using Linear Interpolation

For seasonal series with period > 1, a robust Seasonal Trend Loess (STL) decomposition is first computed. Then a linear interpolation is applied to the seasonally adjusted data, and the seasonal component is added back.

Box Cox Transformation

In many circumstances, a Box Cox transformation can help. Especially if the series is multiplicative meaning the variance grows exponentially. A Box Cox transformation can be automated by setting lambda = "auto" or can be specified by setting lambda = numeric value.

References

See Also

Time Series Analysis:

  • Engineered Features: step_timeseries_signature(), step_holiday_signature(), step_fourier()

  • Diffs & Lags step_diff(), recipes::step_lag()

  • Smoothing: step_slidify(), step_smooth()

  • Variance Reduction: step_box_cox()

  • Imputation: step_ts_impute(), step_ts_clean()

  • Padding: step_ts_pad()

Recipe Setup and Application:

Examples

Run this code

library(tidyverse)
library(tidyquant)
library(recipes)
library(timetk)

# Get missing values
FANG_wide <- FANG %>%
    select(symbol, date, adjusted) %>%
    pivot_wider(names_from = symbol, values_from = adjusted) %>%
    pad_by_time()

FANG_wide

# Apply Imputation
recipe_box_cox <- recipe(~ ., data = FANG_wide) %>%
    step_ts_impute(FB, AMZN, NFLX, GOOG, period = 252, lambda = "auto") %>%
    prep()

recipe_box_cox %>% bake(FANG_wide)

# Lambda parameter used during imputation process
recipe_box_cox %>% tidy(1)


Run the code above in your browser using DataLab