Identify clusters of exceedences.
clusters(data, u, r = 1, ulow = -Inf, rlow = 1, cmax = FALSE, keep.names
= TRUE, plot = FALSE, xdata = seq(along = data), lvals = TRUE, lty =
1, lwd = 1, pch = par("pch"), col = if(n > 250) NULL else "grey",
xlab = "Index", ylab = "Data", ...)
If cmax
is FALSE
(the default), a list with one
component for each identified cluster.
If cmax
is TRUE
, a numeric vector containing the
cluster maxima.
In any case, the returned object has an attribute acs
,
giving the average cluster size (where the cluster size is
defined as the number of exceedences within a cluster), which
will be NaN
if there are no values above the threshold
(and hence no clusters).
If plot
is TRUE
, the list of clusters, or vector
of cluster maxima, is returned invisibly.
A numeric vector, which may contain missing values.
A single value giving the threshold, unless a time varying
threshold is used, in which case u
should be a vector of
thresholds, typically with the same length as data
(or else
the usual recycling rules are applied).
A postive integer denoting the clustering interval length. By default the interval length is one.
A single value giving the lower threshold, unless a time
varying lower threshold is used, in which case ulow
should
be a vector of lower thresholds, typically with the same length as
data
(or else the usual recycling rules are applied).
By default there is no lower threshold (or equivalently, the
lower threshold is -Inf
).
A postive integer denoting the lower clustering interval length. By default the interval length is one.
Logical; if FALSE
(the default), a list
containing the clusters of exceedences is returned. If
TRUE
a numeric vector containing the cluster maxima
is returned.
Logical; if FALSE
, the function makes
no attempt to retain the names/indices of the observations
within the returned object. If data
contains a large
number of observations, this can make the function run much
faster. The argument is mainly designed for internal use.
Logical; if TRUE
a plot is given that depicts
the identified clusters, and the clusters (if cmax
is
FALSE
) or cluster maxima (if cmax
is TRUE
)
are returned invisibly. If FALSE
(the default), the
following arguments are ignored.
A numeric vector with the same length as data
,
giving the values to be plotted on the x-axis.
Logical; should the values below the threshold and the line depicting the lower threshold be plotted?
Line type and width for the lines depicting the threshold and the lower threshold.
Plotting character.
Strips of colour col
are used to identify the
clusters. An observation is contained in the cluster if the
centre of the corresponding plotting character is contained
in the coloured strip. If col
is NULL
the strips
are omitted. By default the strips are coloured "grey"
,
but are omitted whenever data
contains more than 250
observations.
Labels for the x and y axis.
Other graphics parameters.
The clusters of exceedences are identified as follows.
The first exceedence of the threshold initiates the first cluster.
The first cluster then remains active until either r
consecutive values fall below (or are equal to) the threshold,
or until rlow
consecutive values fall below (or are equal
to) the lower threshold.
The next exceedence of the threshold (if it exists) then initiates
the second cluster, and so on.
Missing values are allowed, in which case they are treated as
falling below (or equal to) the threshold, but falling above the
lower threshold.
exi
, exiplot
clusters(portpirie, 4.2, 3)
clusters(portpirie, 4.2, 3, cmax = TRUE)
clusters(portpirie, 4.2, 3, 3.8, plot = TRUE)
clusters(portpirie, 4.2, 3, 3.8, plot = TRUE, lvals = FALSE)
tvu <- c(rep(4.2, 20), rep(4.1, 25), rep(4.2, 20))
clusters(portpirie, tvu, 3, plot = TRUE)
Run the code above in your browser using DataLab