TADPole: TADPole clustering

Description

Time-series Anytime Density Peaks Clustering as proposed by Begum et al., 2015.

Usage

TADPole(data, window.size, k = 2, dc, error.check = TRUE)

Arguments

data

The data matrix where each row is a time series. Optionally a list with each time series.

window.size

Window size constraint for DTW. See details.

The number of desired clusters.

The cutoff distance.

error.check

Should the data be checked for inconsistencies?

Value

A list with:
- cl: Cluster indices.
- centers: Indices of the centers.
- distCalcPercentage: Percentage of distance calculations that were actually performed.

Parallel Computing

Please note that running tasks in parallel does not guarantee faster computations. The overhead introduced is sometimes too large, and it's better to run tasks sequentially.

The user can register a parallel backend, e.g. with the doParallel package, in order to attempt to speed up the calculations (see the examples).

Details

This function can be called either directly or through dtwclust.

TADPole clustering adopts a relatively new clustering framework and adapts it to time series clustering with DTW. See the cited article for the details of the algorithm.

Because of the way the algorithm works, it can be considered a kind of Partitioning Around Medoids (PAM). This means that the cluster centers are always elements of the data. However, this algorithm is deterministic, depending on the value of dc.

The algorithm first uses the DTW's upper and lower bounds to find series with many close neighbors (in DTW space). Anything below the cutoff distance (dc) is considered a neighbor. Aided with this information, the algorithm then tries to prune as many DTW calculations as possible in order to accelerate the clustering procedure. The series that lie in dense areas (i.e. that have lots of neighbors) are taken as cluster centers.

The algorithm relies on the DTW bounds, which are only defined for time series of equal length.

The windowing constraint uses a centered window. The calculations expect a value in window.size that represents the distance between the point considered and one of the edges of the window. Therefore, if, for example, window.size = 10, the warping for an observation $x_i$ considers the points between $x_{i-10}$ and $x_{i+10}$, resulting in 10*2 + 1 = 21 observations falling within the window.

References

Begum N, Ulanova L, Wang J and Keogh E (2015). ``Accelerating Dynamic Time Warping Clustering with a Novel Admissible Pruning Strategy.'' In Conference on Knowledge Discovery and Data Mining, series KDD '15. ISBN 978-1-4503-3664-2/15/08, http://dx.doi.org/10.1145/2783258.2783286.

Examples

Run this code

#### Running TADPole with parallel support
require(doParallel)

# Load data
data(uciCT)

# Reinterpolate to same length and coerce as matrix
data <- t(sapply(CharTraj, reinterpolate, newLength = 180))

# Create parallel workers
cl <- makeCluster(detectCores())
invisible(clusterEvalQ(cl, library(dtwclust)))
registerDoParallel(cl)

# Cluster
kc.tadp <- TADPole(data, k = 20, window.size = 20, dc = 1.5)

# Stop parallel workers
stopCluster(cl)

# Return to sequential computations
registerDoSEQ()

# Compute Rand Index
cat("Rand index for TADPole:", randIndex(kc.tadp$cl, CharTrajLabels), "\n\n")

Run the code above in your browser using DataLab