
Decluster data above a given threshold to try to make them independent.
decluster(x, threshold, ...)# S3 method for data.frame
decluster(x, threshold, ..., which.cols, method = c("runs", "intervals"),
clusterfun = "max")
# S3 method for default
decluster(x, threshold, ..., method = c("runs", "intervals"),
clusterfun = "max")
# S3 method for intervals
decluster(x, threshold, ..., clusterfun = "max", groups = NULL, replace.with,
na.action = na.fail)
# S3 method for runs
decluster(x, threshold, ..., data, r = 1, clusterfun = "max", groups = NULL,
replace.with, na.action = na.fail)
# S3 method for declustered
plot(x, which.plot = c("scatter", "atdf"), qu = 0.85, xlab = NULL,
ylab = NULL, main = NULL, col = "gray", ...)
# S3 method for declustered
print(x, ...)
A numeric vector of class “declustered” is returned with various attributes including:
the function call.
character string giving the name of the data.
value of clusterfun
argument. This is a function.
character string naming the method. Same as input argument.
threshold used for declustering.
character string naming the data used for the groups when applicable.
the run length used (or estimated if “intervals” method employed).
function used to handle missing values. Same as input argument.
muneric giving the clusters of threshold exceedances.
An R data set to be declustered. Can be a data frame or a numeric vector. If a data frame, then which.cols
must be specified.
plot
and print
: an object returned by decluster
.
A data frame containing the data.
numeric of length one or the size of the data over which (non-inclusive) data are to be declustered.
quantile for u
argument in the call to atdf
.
numeric of length one or two. The first component tells which column is the one to decluster, and the second component tells which, if any, column is to serve as groups.
character string naming the type of plot to make.
character string naming the declustering method to employ.
character string naming a function to be applied to the clusters (the returned value is used). Typically, for extreme value analysis (EVA), this will be the cluster maximum (default), but other options are ok as long as they return a single number.
numeric of length x
giving natural groupings that should be considered as separate clusters. For example, suppose data cover only summer months across several years. It would probably not make sense to decluster the data across years (i.e., a new cluster should be defined if they occur in different years).
integer run length stating how many threshold deficits should be used to define a new cluster.
number, NaN, Inf, -Inf, or NA. What should the remaining values in the cluster be replaced with? The default replaces them with threshold
, which for most EVA purposes is ideal.
function to be called to handle missing values.
optioal arguments to the plot
function. If not used, then reasonable default values are used.
optional arguments to decluster.runs
or clusterfun
.
plot
: optional arguments to plot
.
Not used by print
.
Eric Gilleland
Runs declustering (see Coles, 2001 sec. 5.3.2): Extremes separated by fewer than r
non-extremes belong to the same cluster.
Intervals declustering (Ferro and Segers, 2003): Extremes separated by fewer than r
non-extremes belong to the same cluster, where r
is the nc-th largest interexceedance time and nc, the number of clusters, is estimated from the extremal index, theta, and the times between extremes. Setting theta = 1 causes each extreme to form a separate cluster.
The print statement will report the resulting extremal index estimate based on either the runs or intervals estimate depending on the method
argument as well as the number of clusters and run length. For runs declustering, the run length is the same as the argument given by the user, and for intervals method, it is an estimated run length for the resulting declustered data. Note that if the declustered data are independent, the extremal index should be close to one (if not equal to 1).
Coles, S. (2001) An introduction to statistical modeling of extreme values, London, U.K.: Springer-Verlag, 208 pp.
Ferro, C. A. T. and Segers, J. (2003). Inference for clusters of extreme values. Journal of the Royal Statistical Society B, 65, 545--556.
extremalindex
, fevd
y <- rnorm(100, mean=40, sd=20)
y <- apply(cbind(y[1:99], y[2:100]), 1, max)
bl <- rep(1:3, each=33)
ydc <- decluster(y, quantile(y, probs=c(0.75)), r=1, groups=bl)
ydc
plot(ydc)
if (FALSE) {
look <- decluster(-Tphap$MinT, threshold=-73)
look
plot(look)
# The code cannot currently grab data of the type of above.
# Better:
y <- -Tphap$MinT
look <- decluster(y, threshold=-73)
look
plot(look)
# Even better. Use a non-constant threshold.
u <- -70 - 7 *(Tphap$Year - 48)/42
look <- decluster(y, threshold=u)
look
plot(look)
# Better still: account for the fact that there are huge
# gaps in data from one year to another.
bl <- Tphap$Year - 47
look <- decluster(y, threshold=u, groups=bl)
look
plot(look)
# Now try the above with intervals declustering and compare
look2 <- decluster(y, threshold=u, method="intervals", groups=bl)
look2
dev.new()
plot(look2)
# Looks about the same,
# but note that the run length is estimated to be 5.
# Same resulting number of clusters, however.
# May result in different estimate of the extremal
# index.
#
fit <- fevd(look, threshold=u, type="GP", time.units="62/year")
fit
plot(fit)
# cf.
fit2 <- fevd(-MinT~1, Tphap, threshold=u, type="GP", time.units="62/year")
fit2
dev.new()
plot(fit2)
#
fit <- fevd(look, threshold=u, type="PP", time.units="62/year")
fit
plot(fit)
# cf.
fit2 <- fevd(-MinT~1, Tphap, threshold=u, type="PP", time.units="62/year")
fit2
dev.new()
plot(fit2)
}
Run the code above in your browser using DataLab