inspect: Informative sparse projection for estimation of changepoints (inspect)

Description

This is the main function of the package InspectChangepoint. The function inspect estimates the locations of multiple changepoints in the mean structure of a multivariate time series. Multiple changepoints are estimated using a (wild) binary segmentation scheme, whereas each segmentation step uses the locate.change function.

Usage

inspect(
  x,
  lambda,
  threshold,
  schatten = c(1, 2),
  M,
  missing_data = "auto",
  show_progress = FALSE
)

Arguments

The input data matrix of a high-dimensional time series, with each component time series stored as a row.

lambda

Regularisation parameter used in locate.change. If no value is supplied, the dafault value is chosen to be log(log(n)*p/2), where p and n are the number of rows and columns of the data matrix x respectively.

threshold

Threshold level for testing whether an identified changepoint is a true changepoint. If no value is supplied, the threshold level is computed via Monte Carlo simulation of 100 repetitions from the null model.

schatten

The Schatten norm constraint to use in the locate.change function. Default is schatten = 2, i.e. a Frobenius norm constraint.

The Monte Carlo parameter used for wild binary segmentation. Default is M = 0, which means a classical binary segmentation scheme is used.

missing_data

How missing data in x should be handled. If missing_data='meanImpute', then missing data are imputed with row means; if 'MissInspect', use the MissInspect algorithm of Follain et al. (2022)' if 'auto', the program will make the choice depending on the amount of missingness.

show_progress

whether to display progress of computation

Value

The return value is an S3 object of class 'inspect'. It contains a list of two objeccts:

x The input data matrix
changepoints A matrix with three columns. The first column contains the locations of estimated changepoints sorted in increasing order; the second column contains the maximum CUSUM statistics of the projected univariate time series associated with each estimated changepoint; the third column contains the depth of binary segmentation for each detected changepoint.

Details

The input time series is first standardised using the rescale.variance function. Recursive calls of the locate.change function then segments the multivariate time series using (wild) binary segmentation. A changepoint at time z is defined here to mean that the time series has constant mean structure for time up to and including z and constant mean structure for time from z+1 onwards.

More details about model assumption and theoretical guarantees can be found in Wang and Samworth (2016). Note that Monte Carlo computation of the threshold value can be slow, especially for large p. If inspect is to be used multiple times with the same (or similar) data matrix size, it is better to precompute the threshold level via Monte Carlo simulation by calling the compute.threshold function.

References

Wang, T. and Samworth, R. J. (2018) High dimensional changepoint estimation via sparse projection. J. Roy. Statist. Soc., Ser. B, 80, 57--83. Follain, B., Wang, T. and Samworth R. J. (2022) High-dimensional changepoint estimation with heterogeneous missingness. J. Roy. Statist. Soc., Ser. B, to appear

Examples

Run this code

# NOT RUN {
n <- 500; p <- 100; ks <- 30; zs <- c(125,250,375)
varthetas <- c(0.2,0.4,0.6); overlap <- 0.5
obj <- multi.change(n, p, ks, zs, varthetas, overlap)
x <- obj$x
threshold <- compute.threshold(n,p)
ret <- inspect(x, threshold = threshold)
ret
summary(ret)
plot(ret)
# }

Run the code above in your browser using DataLab