Learn R Programming

dtwclust (version 1.3.0)

dtwclust-package: Time series clustering under Dynamic Time Warping (DTW) distance.

Description

Perform time series clustering using different techniques related to the DTW distance and its corresponding lower bounds (LB). Additionally, an implementation of k-Shape clustering is available.

Arguments

Details

This package tries to consolidate the different procedures available to perform clustering of time series under DTW. Most of the optimizations require time series to have equal lengths. DTW itself doesn't, but it's much slower to compute. The shape-based distance (SBD) can also be used for series of different lengths, and it could be faster. If series have different lengths, it is debatable whether it makes sense to cluster them directly. If possible, reinterpolating them could be one way of speeding up calculations (see Ratanamahatana and Keogh, 2004; also reinterpolate).

Please see the documentation for dtwclust, which serves as the main entry point.

Other packages that are particularly leveraged here are the flexclust package for partitional clustering, the proxy package for distance matrix calculations, and the dtw package for the core DTW calculations.

Five distances are registered via pr_DB: "LB_Keogh", "LB_Improved", "SBD", "DTW2" and "DTW_LB". See lb_keogh, lb_improved and SBD for more details on the first 3. DTW2 is done with dtw using L2 norm, but it differs from the result you would obtain if you specify L2 as dist.method: with DTW2, pointwise distances (the local cost matrix) are calculated with L1 norm, each element of the matrix is squared and the result is fed into dtw, which finds the optimum warping path. The square root of the resulting distance is then computed. See dtw_lb for the last one.

Please note that the dist function in the proxy package accepts one or two arguments for data objects. Users should usually use the two-input list version, even if there is just one dataset (i.e. proxy::dist(x=data, y=data, ...)), because otherwise it sometimes fails to detect a whole time series as a single object and, instead, calculates distances between each observation of each time series.

References

Begum N, Ulanova L, Wang J and Keogh E (2015). ``Accelerating Dynamic Time Warping Clustering with a Novel Admissible Pruning Strategy.'' In Conference on Knowledge Discovery and Data Mining, series KDD '15. ISBN 978-1-4503-3664-2/15/08, http://dx.doi.org/10.1145/2783258.2783286.

Giorgino T (2009). ``Computing and Visualizing Dynamic Time Warping Alignments in R: The 'dtw' Package.'' Journal of Statistical Software, 31(7), pp. 1-24. http://www.jstatsoft.org/v31/i07/.

Ratanamahatana A and Keogh E (2004). ``Everything you know about dynamic time warping is wrong.'' In 3rd Workshop on Mining Temporal and Sequential Data, in conjunction with 10th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining (KDD-2004), Seattle, WA.

Keogh E and Ratanamahatana CA (2005). ``Exact indexing of dynamic time warping.'' Knowledge and information systems, 7(3), pp. 358-386.

Lemire D (2009). ``Faster retrieval with a two-pass dynamic-time-warping lower bound .'' Pattern Recognition, 42(9), pp. 2169 - 2180. ISSN 0031-3203, http://dx.doi.org/10.1016/j.patcog.2008.11.030, http://www.sciencedirect.com/science/article/pii/S0031320308004925.

Liao TW (2005). ``Clustering of time series data - a survey.'' Pattern recognition, 38(11), pp. 1857-1874.

Paparrizos J and Gravano L (2015). ``k-Shape: Efficient and Accurate Clustering of Time Series.'' In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, series SIGMOD '15, pp. 1855-1870. ISBN 978-1-4503-2758-9, http://doi.org/10.1145/2723372.2737793.

Petitjean F, Ketterlin A and Gancarski P (2011). ``A global averaging method for dynamic time warping, with applications to clustering.'' Pattern Recognition, 44(3), pp. 678 - 693. ISSN 0031-3203, http://dx.doi.org/10.1016/j.patcog.2010.09.013, http://www.sciencedirect.com/science/article/pii/S003132031000453X.

Sakoe H and Chiba S (1978). ``Dynamic programming algorithm optimization for spoken word recognition.'' Acoustics, Speech and Signal Processing, IEEE Transactions on, 26(1), pp. 43-49. ISSN 0096-3518, http://doi.org/10.1109/TASSP.1978.1163055.

Wang X, Mueen A, Ding H, Trajcevski G, Scheuermann P and Keogh E (2013). ``Experimental comparison of representation methods and distance measures for time series data.'' Data Mining and Knowledge Discovery, 26(2), pp. 275-309. ISSN 1384-5810, http://doi.org/10.1007/s10618-012-0250-5, http://dx.doi.org/10.1007/s10618-012-0250-5.

See Also

dtwclust, kcca, dist, dtw