seqClosedEndCpDist: Closed-end Sequential Test for Change-Point Detection in Possibly Multivariate Time Series Sensitive to Changes in the Contemporary Distribution Function

Description

Closed-end nonparametric sequential test for change-point detection based on the (multivariate) empirical distribution function. The observations can be continuous univariate or multivariate, and serially independent or dependent (strongly mixing). To carry out the test, four steps are required. The first step consists of simulating under the null many trajectories of the detector function. The second step consists of estimating a piecewise constant threshold function from these trajectories. The third step consists of computing the detector function from the data to be monitored. The fourth and last step consists of comparing the detector function with the estimated threshold function. Each of these steps corresponds to one of the functions in the usage section below. The current implementation is preliminary and not optimized for real-time monitoring (but could still be used for that). If the observations to be monitored are univariate and can be assumed serially independent, the simulation of the trajectories of the detector functions can be carried out using Monte Carlo simulation. In all other cases, the test relies on a dependent multiplier bootstrap. Details can be found in the second reference.

Usage

simClosedEndCpDist(x.learn = NULL, m = NULL, n, gamma = 0.25, delta = 1e-4,
                   B = 1000, method = c("sim", "mult"), b = NULL,
                   weights = c("parzen", "bartlett"), g = 5,
                   L.method = c("max","median","mean","min"))
threshClosedEndCpDist(sims, p = 1, alpha = 0.05, type = 7)
detClosedEndCpDist(x.learn, x, gamma = 0.25, delta = 1e-4)
monClosedEndCpDist(det, thresh, statistic = c("mac", "mmc", "mmk", "mk", "mc"),
                   plot = TRUE)

Value

All functions return lists whose components have explicit names. The function monClosedEndCpDist() in particular returns a list whose components are

alarm: a logical indicating whether the detector function has exceeded the threshold function.
time.alarm: an integer corresponding to the time at which the detector function has exceeded the threshold function or NA.
times.max: a vector of times at which the successive detectors "mmc" (if statistic = "mac" or statistic = "mmc") or "mmk" (if statistic = "mmk") have reached their maximum; a vector of NA's if statistic = "mc" or statistic = "mk"; this sequence of times can be used to estimate the time of change from the time of alarm.
time.change: an integer giving the estimated time of change if alarm is TRUE; the latter is simply the value in times.max which corresponds to time.alarm.

Arguments

x.learn: a data matrix whose rows are continuous observations, representing the learning sample.
m: a strictly positive integer specifying the size of the learning sample if x.learn is not specified; the latter implies that the observations are univariate and assumed to be independent; if m is not specified, it is taken equal to nrow(x.learn).
n: a strictly positive integer specifying the monitoring horizon; the monitoring period is m+1, ..., n.
gamma: a real parameter between 0 and 0.5 appearing in the definition of the weight function used in the detector function.
delta: a real parameter between 0 and 1 appearing in the definition of the weight function used in the detector function.
B: the number of trajectories of the detector function to simulate under the null.
method: a string specifying the trajectory simulation method; can be either "sim" (Monte Carlo simulation -- only in the univariate case under the assumption of serial independence) or "mult" (the dependent multiplier bootstrap).
b: strictly positive integer specifying the value of the bandwidth parameter determining the serial dependence when generating dependent multiplier sequences using the 'moving average approach'; see Section 5 of the first reference. The value 1 will create i.i.d. multiplier sequences suitable for serially independent observations. If set to NULL, b will be estimated from x.learn using the function bOptEmpProc(); see the procedure described in Section 5 of the first reference.
weights: a string specifying the kernel for creating the weights used in the generation of dependent multiplier sequences within the 'moving average approach'; see Section 5 of the first reference.
g: a strictly positive integer specifying the number of points of the uniform grid on \((0,1)^d\) (where \(d\) is ncol(x)) involved in the estimation of the bandwidth parameter; see Section 5 of the first reference. The number of points of the grid is given by g^ncol(x) so that g needs to be decreased as \(d\) increases.
L.method: a string specifying how the parameter \(L\) involved in the estimation of the bandwidth parameter is computed; see Section 5 of the first reference.
sims: an object of class sims.cpDist containing simulated trajectories of the detector function under the null.
p: a strictly positive integer specifying the number of steps of the piece constant threshold function; p should not be taken too large (say, smaller than 4) if method = "mult".
alpha: the value of the desired significance level for the sequential test.
type: an integer between 1 and 9 selecting one of the nine quantile algorithms detailed in the help of the function quantile().
x: a data matrix whose rows are continuous observations corresponding to the new observations to be monitored for a change in contemporary distribution.
det: an object of class det.cpDist representing a detector function computed using detClosedEndCpDist().
thresh: an object of class thresh.cpDist representing a threshold function estimated using threshClosedEndCpDist().
statistic: a string specifying the statistic/detector to be used for the monitoring; can be either "mac", "mmc", "mmk", "mc" or "mk"; the last letter specifies whether it is a Cramér-von Mises-like statistic (letter "c") or a Kolmogorov-Smirnov-like statistic (letter "k"); the letters before specify the type of aggregation steps used to compute the detectors ("m" for maximum, "a" for average); "mac" corresponds to the detector \(T_{m,q}\) in the second reference, "mmc" to the detector \(S_{m,q}\), "mmk" to the detector \(R_{m,q}\), "mc" to the detector \(Q_m\) and "mk" to the detector \(P_m\).
plot: logical indicating whether the monitoring should be plotted.

Details

The testing procedure is described in detail in the second reference.

References

A. Bücher and I. Kojadinovic (2016), A dependent multiplier bootstrap for the sequential empirical copula process under strong mixing, Bernoulli 22:2, pages 927-968, https://arxiv.org/abs/1306.3930.

I. Kojadinovic and G. Verdier (2021), Nonparametric sequential change-point detection for multivariate time series based on empirical distribution functions, Electronic Journal of Statistics 15(1), pages 773-829, tools:::Rd_expr_doi("10.1214/21-EJS1798").

Examples

Run this code

if (FALSE) {
## Example of montoring for the period m+1, ..., n
m <- 100 # size of the learning sample
n <- 150 # monitoring horizon

## The learning sample
set.seed(123)
x.learn <- matrix(rnorm(m))

## New observations with a large change in mean
## to simulate monitoring for the period m+1, ..., n
k <- 125 ## the true change-point
x <- matrix(c(rnorm(k-m), rnorm(n-k, mean = 2)))

## Step 1: Simulation of B trajectories of the detector functions under the null
B <- 1e4

## Under the assumption of serial independence
## (no need to specify the learning sample)
traj.sim <- simClosedEndCpDist(m = m, n = n, B = B, method = "sim")

## Without the assumption of serial independence
## (the learning sample is compulsory; the larger it is, the better;
## the monitoring horizon n should not be too large)
traj.mult <- simClosedEndCpDist(x.learn = x.learn, n = n, B = B, method = "mult")

## Step 2: Compute threshold functions with p steps
p <- 2
tf.sim <- threshClosedEndCpDist(traj.sim, p = p)
# p can be taken large if B is very large

tf.mult <- threshClosedEndCpDist(traj.mult, p = p) # p should not be taken too
                                          # large unless both m and B
                                          # are very large

## Step 3: Compute the detectors for the monitoring period m+1, ... , n
det <- detClosedEndCpDist(x.learn = x.learn, x = x)

## Step 4: Monitoring

## Simulate the monitoring with the first threshold function
monClosedEndCpDist(det, tf.sim)

## Simulate the monitoring with the second threshold function
monClosedEndCpDist(det, tf.mult)

## Simulate the monitoring with the first threshold function
## and another detector function
monClosedEndCpDist(det, tf.sim, statistic = "mmk")

## Alternative steps 3 and 4:

## Compute the detectors for the monitoring period m+1, ... , m+20 only
det <- detClosedEndCpDist(x.learn = x.learn, x = x[1:20,,drop = FALSE])

## Simulate the monitoring with the first threshold function
monClosedEndCpDist(det, tf.sim)

## Simulate the monitoring with the second threshold function
monClosedEndCpDist(det, tf.mult)
}

Run the code above in your browser using DataLab