dtwclust-package: Time series clustering along with optimizations for the Dynamic Time Warping distance

Description

Time series clustering with a wide variety of strategies and a series of optimizations specific to the Dynamic Time Warping (DTW) distance and its corresponding lower bounds (LBs). There are implementations of both traditional clustering algorithms, and more recent procedures such as k-Shape and TADPole clustering. The functionality can be easily extended with custom distance measures and centroid definitions.

Arguments

Details

Many of the algorithms implemented in this package are specifically tailored to time series and DTW, hence its name. However, the main clustering function is flexible so that one can test many different clustering approaches, using either the time series directly, or by applying suitable transformations and then clustering in the resulting space.

DTW is a dynamic programming algorithm that tries to find the optimum warping path between two series. Over the years, several variations have appeared in order to make the procedure faster or more efficient. Please refer to the included references for more information, especially Giorgino (2009), which is a good practical introduction.

Most optimizations require equal dimensionality, which means time series should have equal length. DTW itself does not require this, but it is relatively expensive to compute. Other distance definitions may be used, or series could be reinterpolated to a matching length (Ratanamahatana and Keogh, 2004).

Other packages that are particularly leveraged here are the proxy package for distance matrix calculations, and the dtw package for the core DTW calculations. The main clustering function and entry point for this package is dtwclust. There are a couple of things to bear in mind while using this package:

Most distance calculations make use of the dist function in the proxy package, which accepts one or two arguments for data objects (x and y). Users should use the two-input list version, even if there is just one dataset (i.e. proxy::dist(x = dataset, y = dataset, ...)), because otherwise it sometimes fails to detect a whole time series as a single object and, instead, calculates distances between each observation of each time series.

Additionally, please note the random number generator is set to L'Ecuyer-CMRG when dtwclust is attached in an attempt to preserve reproducibility. You are free to change this afterwards if you wish. See RNGkind.

References

Begum N, Ulanova L, Wang J and Keogh E (2015). ``Accelerating Dynamic Time Warping Clustering with a Novel Admissible Pruning Strategy.'' In Conference on Knowledge Discovery and Data Mining, series KDD '15. ISBN 978-1-4503-3664-2/15/08, http://dx.doi.org/10.1145/2783258.2783286.

Giorgino T (2009). ``Computing and Visualizing Dynamic Time Warping Alignments in R: The 'dtw' Package.'' Journal of Statistical Software, 31(7), pp. 1-24. http://www.jstatsoft.org/v31/i07/.

Ratanamahatana A and Keogh E (2004). ``Everything you know about dynamic time warping is wrong.'' In 3rd Workshop on Mining Temporal and Sequential Data, in conjunction with 10th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining (KDD-2004), Seattle, WA.

Keogh E and Ratanamahatana CA (2005). ``Exact indexing of dynamic time warping.'' Knowledge and information systems, 7(3), pp. 358-386.

Lemire D (2009). ``Faster retrieval with a two-pass dynamic-time-warping lower bound .'' Pattern Recognition, 42(9), pp. 2169 - 2180. ISSN 0031-3203, http://dx.doi.org/10.1016/j.patcog.2008.11.030, http://www.sciencedirect.com/science/article/pii/S0031320308004925.

Liao TW (2005). ``Clustering of time series data - a survey.'' Pattern recognition, 38(11), pp. 1857-1874.

Paparrizos J and Gravano L (2015). ``k-Shape: Efficient and Accurate Clustering of Time Series.'' In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, series SIGMOD '15, pp. 1855-1870. ISBN 978-1-4503-2758-9, http://doi.org/10.1145/2723372.2737793.

Petitjean F, Ketterlin A and Gancarski P (2011). ``A global averaging method for dynamic time warping, with applications to clustering.'' Pattern Recognition, 44(3), pp. 678 - 693. ISSN 0031-3203, http://dx.doi.org/10.1016/j.patcog.2010.09.013, http://www.sciencedirect.com/science/article/pii/S003132031000453X.

Sakoe H and Chiba S (1978). ``Dynamic programming algorithm optimization for spoken word recognition.'' Acoustics, Speech and Signal Processing, IEEE Transactions on, 26(1), pp. 43-49. ISSN 0096-3518, http://doi.org/10.1109/TASSP.1978.1163055.

Wang X, Mueen A, Ding H, Trajcevski G, Scheuermann P and Keogh E (2013). ``Experimental comparison of representation methods and distance measures for time series data.'' Data Mining and Knowledge Discovery, 26(2), pp. 275-309. ISSN 1384-5810, http://doi.org/10.1007/s10618-012-0250-5, http://dx.doi.org/10.1007/s10618-012-0250-5.

Bedzek, J.C. (1981). Pattern recognition with fuzzy objective function algorithms.

D'Urso, P., & Maharaj, E. A. (2009). Autocorrelation-based fuzzy clustering of time series. Fuzzy Sets and Systems, 160(24), 3565-3589.

Description

Arguments

Details

References

See Also