decluster: Decluster Data Above a Threshold

Description

Decluster data above a given threshold to try to make them independent.

Usage

decluster(x, threshold, ...)
# S3 method for data.frame
decluster(x, threshold, ..., which.cols, method = c("runs", "intervals"), 
    clusterfun = "max")
# S3 method for default
decluster(x, threshold, ..., method = c("runs", "intervals"),
    clusterfun = "max")
# S3 method for intervals
decluster(x, threshold, ..., clusterfun = "max", groups = NULL, replace.with, 
    na.action = na.fail)
# S3 method for runs
decluster(x, threshold, ..., data, r = 1, clusterfun = "max", groups = NULL, 
    replace.with, na.action = na.fail)
# S3 method for declustered
plot(x, which.plot = c("scatter", "atdf"), qu = 0.85, xlab = NULL, 
    ylab = NULL, main = NULL, col = "gray", ...)
# S3 method for declustered
print(x, ...)

Arguments

An R data set to be declustered. Can be a data frame or a numeric vector. If a data frame, then which.cols must be specified.

plot and print: an object returned by decluster.

data

A data frame containing the data.

threshold

numeric of length one or the size of the data over which (non-inclusive) data are to be declustered.

quantile for u argument in the call to atdf.

which.cols

numeric of length one or two. The first component tells which column is the one to decluster, and the second component tells which, if any, column is to serve as groups.

which.plot

character string naming the type of plot to make.

method

character string naming the declustering method to employ.

clusterfun

character string naming a function to be applied to the clusters (the returned value is used). Typically, for extreme value analysis (EVA), this will be the cluster maximum (default), but other options are ok as long as they return a single number.

groups

numeric of length x giving natural groupings that should be considered as separate clusters. For example, suppose data cover only summer months across several years. It would probably not make sense to decluster the data across years (i.e., a new cluster should be defined if they occur in different years).

integer run length stating how many threshold deficits should be used to define a new cluster.

replace.with

number, NaN, Inf, -Inf, or NA. What should the remaining values in the cluster be replaced with? The default replaces them with threshold, which for most EVA purposes is ideal.

na.action

function to be called to handle missing values.

xlab, ylab, main, col

optioal arguments to the plot function. If not used, then reasonable default values are used.

…

optional arguments to decluster.runs or clusterfun.

plot: optional arguments to plot.

Not used by print.

Value

A numeric vector of class “declustered” is returned with various attributes including:

call

the function call.

data.name

character string giving the name of the data.

decluster.function

value of clusterfun argument. This is a function.

method

character string naming the method. Same as input argument.

threshold

threshold used for declustering.

groups

character string naming the data used for the groups when applicable.

run.length

the run length used (or estimated if “intervals” method employed).

na.action

function used to handle missing values. Same as input argument.

clusters

muneric giving the clusters of threshold exceedances.

Details

Runs declustering (see Coles, 2001 sec. 5.3.2): Extremes separated by fewer than r non-extremes belong to the same cluster.

Intervals declustering (Ferro and Segers, 2003): Extremes separated by fewer than r non-extremes belong to the same cluster, where r is the nc-th largest interexceedance time and nc, the number of clusters, is estimated from the extremal index, theta, and the times between extremes. Setting theta = 1 causes each extreme to form a separate cluster.

The print statement will report the resulting extremal index estimate based on either the runs or intervals estimate depending on the method argument as well as the number of clusters and run length. For runs declustering, the run length is the same as the argument given by the user, and for intervals method, it is an estimated run length for the resulting declustered data. Note that if the declustered data are independent, the extremal index should be close to one (if not equal to 1).

References

Coles, S. (2001) An introduction to statistical modeling of extreme values, London, U.K.: Springer-Verlag, 208 pp.

Ferro, C. A. T. and Segers, J. (2003). Inference for clusters of extreme values. Journal of the Royal Statistical Society B, 65, 545--556.

Examples

Run this code

# NOT RUN {
y <- rnorm(100, mean=40, sd=20)
y <- apply(cbind(y[1:99], y[2:100]), 1, max)
bl <- rep(1:3, each=33)

ydc <- decluster(y, quantile(y, probs=c(0.75)), r=1, groups=bl)
ydc

plot(ydc)

# }
# NOT RUN {
look <- decluster(-Tphap$MinT, threshold=-73)
look
plot(look)

# The code cannot currently grab data of the type of above.
# Better:
y <- -Tphap$MinT
look <- decluster(y, threshold=-73)
look
plot(look)

# Even better.  Use a non-constant threshold.
u <- -70 - 7 *(Tphap$Year - 48)/42
look <- decluster(y, threshold=u)
look
plot(look)

# Better still: account for the fact that there are huge
# gaps in data from one year to another.
bl <- Tphap$Year - 47
look <- decluster(y, threshold=u, groups=bl)
look
plot(look)


# Now try the above with intervals declustering and compare 
look2 <- decluster(y, threshold=u, method="intervals", groups=bl)
look2
dev.new()
plot(look2)
# Looks about the same,
# but note that the run length is estimated to be 5.
# Same resulting number of clusters, however.
# May result in different estimate of the extremal
# index.


#
fit <- fevd(look, threshold=u, type="GP", time.units="62/year")
fit
plot(fit)

# cf.
fit2 <- fevd(-MinT~1, Tphap, threshold=u, type="GP", time.units="62/year")
fit2
dev.new()
plot(fit2)

#
fit <- fevd(look, threshold=u, type="PP", time.units="62/year")
fit
plot(fit)

# cf.
fit2 <- fevd(-MinT~1, Tphap, threshold=u, type="PP", time.units="62/year")
fit2
dev.new()
plot(fit2)


# }

Run the code above in your browser using DataLab