Learn R Programming

funtimes (version 9.1)

BICC: BIC-Based Spatio-Temporal Clustering

Description

Apply the algorithm of unsupervised spatio-temporal clustering, TRUST Ciampi_etal_2010funtimes, with automatic selection of its tuning parameters Delta and Epsilon based on Bayesian information criterion, BIC Schaeffer_etal_2016_trustfuntimes.

Usage

BICC(X, Alpha = NULL, Beta = NULL, Theta = 0.8, p, w, s)

Value

A list with the following elements:

delta.opt

optimal value for the clustering parameter Delta.

epsilon.opt

optimal value for the clustering parameter Epsilon.

clusters

vector of length ncol(X) with cluster labels.

IC

values of the information criterion (BIC) for each considered combination of Delta (rows) and Epsilon (columns).

delta.all

vector of considered values for Delta.

epsilon.all

vector of considered values for Epsilon.

Arguments

X

a matrix of time series observed within a slide (time series in columns).

Alpha

lower limit of the time-series domain, passed to CSlideCluster.

Beta

upper limit of the time-series domain passed to CSlideCluster.

Theta

connectivity parameter passed to CSlideCluster.

p

number of layers (time-series observations) in each slide.

w

number of slides in each window.

s

step to shift a window, calculated in the number of slides. The recommended values are 1 (overlapping windows) or equal to w (non-overlapping windows).

Author

Ethan Schaeffer, Vyacheslav Lyubchich

Details

This is the upper-level function for time series clustering. It exploits the functions CWindowCluster and CSlideCluster to cluster time series based on closeness and homogeneity measures. Clustering is performed multiple times with a range of equidistant values for the parameters Delta and Epsilon, then optimal parameters Delta and Epsilon along with the corresponding clustering results are shown @see @Schaeffer_etal_2016_trust, for more detailsfuntimes.

The total length of time series (number of levels, i.e., nrow(X)) should be divisible by p.

References

See Also

CSlideCluster, CWindowCluster, purity

Examples

Run this code
# Fix seed for reproducible simulations:
set.seed(1)

##### Example 1
# Similar to Schaeffer et al. (2016), simulate 3 years of monthly data 
#for 10 locations and apply clustering:
# 1.1 Simulation
T <- 36 #total months
N <- 10 #locations
phi <- c(0.5) #parameter of autoregression
burn <- 300 #burn-in period for simulations
X <- sapply(1:N, function(x) 
    arima.sim(n = T + burn, 
              list(order = c(length(phi), 0, 0), ar = phi)))[(burn + 1):(T + burn),]
colnames(X) <- paste("TS", c(1:dim(X)[2]), sep = "")

# 1.2 Clustering
# Assume that information arrives in year-long slides or data chunks
p <- 12 #number of time layers (months) in a slide
# Let the upper level of clustering (window) be the whole period of 3 years, so
w <- 3 #number of slides in a window
s <- w #step to shift a window, but it does not matter much here as we have only one window of data
tmp <- BICC(X, p = p, w = w, s = s)

# 1.3 Evaluate clustering
# In these simulations, it is known that all time series belong to one class,
#since they were all simulated the same way:
classes <- rep(1, 10)
# Use the information on the classes to calculate clustering purity:
purity(classes, tmp$clusters[1,])

##### Example 2
# 2.1 Modify time series and update classes accordingly:
# Add a mean shift to a half of the time series:
X2 <- X
X2[, 1:(N/2)] <- X2[, 1:(N/2)] + 3
classes2 <- rep(c(1, 2), each = N/2)

# 2.2 Re-apply clustering procedure and evaluate clustering purity:
tmp2 <- BICC(X2, p = p, w = w, s = s)
tmp2$clusters
purity(classes2, tmp2$clusters[1,])

Run the code above in your browser using DataLab