tk_tsfeatures()
is a tidyverse compliant wrapper for tsfeatures::tsfeatures()
.
The function computes a matrix of time series features that describes the various time
series. It's designed for groupwise analysis using dplyr
groups.
tk_tsfeatures(
.data,
.date_var,
.value,
.period = "auto",
.features = c("frequency", "stl_features", "entropy", "acf_features"),
.scale = TRUE,
.trim = FALSE,
.trim_amount = 0.1,
.parallel = FALSE,
.na_action = na.pass,
.prefix = "ts_",
.silent = TRUE,
...
)
A tibble
or data.frame
with aggregated features that describe each time series.
A tibble
or data.frame
with a time-based column
A column containing either date or date-time values
A column containing numeric values
The periodicity (frequency) of the time series data. Values can be provided as follows:
"auto" (default) Calculates using tk_get_frequency()
.
"2 weeks": Would calculate the median number of observations in a 2-week window.
7 (numeric): Would interpret the ts
frequency as 7 observations per cycle (common for weekly data)
Passed to features
in the underlying tsfeatures()
function.
A vector of function names that represent a feature aggregation function. Examples:
Use one of the function names from tsfeatures
R package e.g.("lumpiness", "stl_features").
Use a function name (e.g. "mean" or "median")
Create your own function and provide the function name
If TRUE
, time series are scaled to mean 0 and sd 1 before features are computed.
If TRUE
, time series are trimmed by trim_amount before features are computed.
Values larger than trim_amount in absolute value are set to NA
.
Default level of trimming if trim==TRUE. Default: 0.1.
If TRUE, multiple cores (or multiple sessions) will be used. This only speeds things up when there are a large number of time series.
When .parallel = TRUE
, the multiprocess = future::multisession
.
This can be adjusted by setting multiprocess
parameter.
See the tsfeatures::tsfeatures()
function for mor details.
A function to handle missing values. Use na.interp to estimate missing values.
A prefix to prefix the feature columns. Default: "ts_"
.
Whether or not to show messages and warnings.
Other arguments get passed to the feature functions.
The timetk::tk_tsfeatures()
function implements the tsfeatures
package
for computing aggregated feature matrix for time series that is useful in many types of
analysis such as clustering time series.
The timetk
version ports the tsfeatures::tsfeatures()
function to a tidyverse
-compliant
format that uses a tidy data frame containing grouping columns (optional), a date column, and
a value column. Other columns are ignored.
It then becomes easy to summarize each time series by group-wise application of .features
,
which are simply functions that evaluate a time series and return single aggregated value.
(Example: "mean" would return the mean of the time series (note that values are scaled to mean 1 and sd 0 first))
Function Internals:
Internally, the time series are converted to ts
class using tk_ts(.period)
where the
period is the frequency of the time series. Values can be provided for .period
, which will be used
prior to convertion to ts
class.
The function then leverages tsfeatures::tsfeatures()
to compute the feature matrix of summarized
feature values.
Rob Hyndman, Yanfei Kang, Pablo Montero-Manso, Thiyanga Talagala, Earo Wang, Yangzhuoran Yang, Mitchell O'Hara-Wild: tsfeatures R package
library(dplyr)
walmart_sales_weekly %>%
group_by(id) %>%
tk_tsfeatures(
.date_var = Date,
.value = Weekly_Sales,
.period = 52,
.features = c("frequency", "stl_features", "entropy", "acf_features", "mean"),
.scale = TRUE,
.prefix = "ts_"
)
Run the code above in your browser using DataLab