clusters: Identify Clusters of Exceedences

Description

Identify clusters of exceedences.

Usage

clusters(data, u, r = 1, ulow = -Inf, rlow = 1, cmax = FALSE, keep.names
    = TRUE, plot = FALSE, xdata = seq(along = data), lvals = TRUE, lty =
    1, lwd = 1, pch = par("pch"), col = if(n > 250) NULL else "grey",
    xlab = "Index", ylab = "Data", ...)

Value

If cmax is FALSE (the default), a list with one component for each identified cluster. If cmax is TRUE, a numeric vector containing the cluster maxima. In any case, the returned object has an attribute acs, giving the average cluster size (where the cluster size is defined as the number of exceedences within a cluster), which will be NaN if there are no values above the threshold (and hence no clusters).

If plot is TRUE, the list of clusters, or vector of cluster maxima, is returned invisibly.

Arguments

data: A numeric vector, which may contain missing values.
u: A single value giving the threshold, unless a time varying threshold is used, in which case u should be a vector of thresholds, typically with the same length as data (or else the usual recycling rules are applied).
r: A postive integer denoting the clustering interval length. By default the interval length is one.
ulow: A single value giving the lower threshold, unless a time varying lower threshold is used, in which case ulow should be a vector of lower thresholds, typically with the same length as data (or else the usual recycling rules are applied). By default there is no lower threshold (or equivalently, the lower threshold is -Inf).
rlow: A postive integer denoting the lower clustering interval length. By default the interval length is one.
cmax: Logical; if FALSE (the default), a list containing the clusters of exceedences is returned. If TRUE a numeric vector containing the cluster maxima is returned.
keep.names: Logical; if FALSE, the function makes no attempt to retain the names/indices of the observations within the returned object. If data contains a large number of observations, this can make the function run much faster. The argument is mainly designed for internal use.
plot: Logical; if TRUE a plot is given that depicts the identified clusters, and the clusters (if cmax is FALSE) or cluster maxima (if cmax is TRUE) are returned invisibly. If FALSE (the default), the following arguments are ignored.
xdata: A numeric vector with the same length as data, giving the values to be plotted on the x-axis.
lvals: Logical; should the values below the threshold and the line depicting the lower threshold be plotted?
lty, lwd: Line type and width for the lines depicting the threshold and the lower threshold.
pch: Plotting character.
col: Strips of colour col are used to identify the clusters. An observation is contained in the cluster if the centre of the corresponding plotting character is contained in the coloured strip. If col is NULL the strips are omitted. By default the strips are coloured "grey", but are omitted whenever data contains more than 250 observations.
xlab, ylab: Labels for the x and y axis.
...: Other graphics parameters.

Details

The clusters of exceedences are identified as follows. The first exceedence of the threshold initiates the first cluster. The first cluster then remains active until either r consecutive values fall below (or are equal to) the threshold, or until rlow consecutive values fall below (or are equal to) the lower threshold. The next exceedence of the threshold (if it exists) then initiates the second cluster, and so on. Missing values are allowed, in which case they are treated as falling below (or equal to) the threshold, but falling above the lower threshold.

Examples

Run this code

clusters(portpirie, 4.2, 3)
clusters(portpirie, 4.2, 3, cmax = TRUE)
clusters(portpirie, 4.2, 3, 3.8, plot = TRUE)
clusters(portpirie, 4.2, 3, 3.8, plot = TRUE, lvals = FALSE)
tvu <- c(rep(4.2, 20), rep(4.1, 25), rep(4.2, 20))
clusters(portpirie, tvu, 3, plot = TRUE)