DTW is a dynamic programming algorithm that tries to find the optimum warping path between two series. Over the years, several variations have appeared in order to make the procedure faster or more efficient. Please refer to the included references for more information, especially Giorgino (2009), which is a good practical introduction.
Most optimizations require equal dimensionality, which means time series should have equal length. DTW itself does not require this, but it is relatively expensive to compute. Other distance definitions may be used, or series could be reinterpolated to a matching length (Ratanamahatana and Keogh, 2004).
Other packages that are particularly leveraged here are the proxy
package for distance
matrix calculations, and the dtw
package for the core DTW calculations. The main clustering
function and entry point for this package is dtwclust
. There are a couple of things
to bear in mind while using this package:
Most distance calculations make use of the dist
function in the proxy
package, which accepts one or two arguments for data objects (x
and y
). Users should
use the two-input list version, even if there is just one dataset (i.e.
proxy::dist(x = dataset, y = dataset, ...)
), because otherwise it sometimes fails to
detect a whole time series as a single object and, instead, calculates distances between each observation
of each time series.
Additionally, please note the random number generator is set to L'Ecuyer-CMRG when dtwclust
is attached in an attempt to preserve reproducibility. You are free to change this afterwards if
you wish. See RNGkind
.
Giorgino T (2009). ``Computing and Visualizing Dynamic Time Warping Alignments in R
: The
'dtw' Package.'' Journal of Statistical Software, 31(7), pp. 1-24.
http://www.jstatsoft.org/v31/i07/.
Ratanamahatana A and Keogh E (2004). ``Everything you know about dynamic time warping is wrong.'' In 3rd Workshop on Mining Temporal and Sequential Data, in conjunction with 10th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining (KDD-2004), Seattle, WA.
Keogh E and Ratanamahatana CA (2005). ``Exact indexing of dynamic time warping.'' Knowledge and information systems, 7(3), pp. 358-386.
Lemire D (2009). ``Faster retrieval with a two-pass dynamic-time-warping lower bound .'' Pattern Recognition, 42(9), pp. 2169 - 2180. ISSN 0031-3203, http://dx.doi.org/10.1016/j.patcog.2008.11.030, http://www.sciencedirect.com/science/article/pii/S0031320308004925.
Liao TW (2005). ``Clustering of time series data - a survey.'' Pattern recognition, 38(11), pp. 1857-1874.
Paparrizos J and Gravano L (2015). ``k-Shape: Efficient and Accurate Clustering of Time Series.'' In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, series SIGMOD '15, pp. 1855-1870. ISBN 978-1-4503-2758-9, http://doi.org/10.1145/2723372.2737793.
Petitjean F, Ketterlin A and Gancarski P (2011). ``A global averaging method for dynamic time warping, with applications to clustering.'' Pattern Recognition, 44(3), pp. 678 - 693. ISSN 0031-3203, http://dx.doi.org/10.1016/j.patcog.2010.09.013, http://www.sciencedirect.com/science/article/pii/S003132031000453X.
Sakoe H and Chiba S (1978). ``Dynamic programming algorithm optimization for spoken word recognition.'' Acoustics, Speech and Signal Processing, IEEE Transactions on, 26(1), pp. 43-49. ISSN 0096-3518, http://doi.org/10.1109/TASSP.1978.1163055.
Wang X, Mueen A, Ding H, Trajcevski G, Scheuermann P and Keogh E (2013). ``Experimental comparison of representation methods and distance measures for time series data.'' Data Mining and Knowledge Discovery, 26(2), pp. 275-309. ISSN 1384-5810, http://doi.org/10.1007/s10618-012-0250-5, http://dx.doi.org/10.1007/s10618-012-0250-5.
Bedzek, J.C. (1981). Pattern recognition with fuzzy objective function algorithms.
D'Urso, P., & Maharaj, E. A. (2009). Autocorrelation-based fuzzy clustering of time series. Fuzzy Sets and Systems, 160(24), 3565-3589.
dtwclust
, dist
, dtw