SBD: Shape-based distance

Description

Distance based on coefficient-normalized cross-correlation as proposed by Papparizos and Gravano, 2015, for the k-Shape clustering algorithm.

Usage

SBD(x, y, znorm = FALSE)

Arguments

A time series.

Another time series.

znorm

Should each series be z-normalized before calculating the distance?

Value

A list with:
- dist: The distance betweenxandy.
- yshift: A shifted version ofyso that it optimally mathcesx.

Details

This function works best if the series are z-normalized. If not, at least they should have corresponding amplitudes, since the values of the signal do affect the outcome.

If x and y do not have the same length, it would be best if the longer sequence is provided in y, because it will be shifted to match x. Anything before the matching point is discarded and the series is padded with trailing zeros as needed.

The output values lie between 0 and 2, with 0 indicating perfect similarity.

References

Paparrizos J and Gravano L (2015). ``k-Shape: Efficient and Accurate Clustering of Time Series.'' In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, series SIGMOD '15, pp. 1855-1870. ISBN 978-1-4503-2758-9, http://doi.org/10.1145/2723372.2737793.

Examples

Run this code

# load data
data(uciCT)

# distance between series of different lengths
sbd <- SBD(CharTraj[[1]], CharTraj[[100]], znorm = TRUE)$dist

# cross-distance matrix for series subset (notice the two-list input)
sbD <- proxy::dist(CharTraj[1:10], CharTraj[1:10], method = "SBD", znorm = TRUE)

Run the code above in your browser using DataLab