This is the main function of the package InspectChangepoint. The function inspect
estimates the locations of multiple changepoints in the mean structure of a multivariate time series. Multiple changepoints are estimated using a (wild) binary segmentation scheme, whereas each segmentation step uses the locate.change
function.
inspect(
x,
lambda,
threshold,
schatten = c(1, 2),
M,
missing_data = "auto",
show_progress = FALSE
)
The input data matrix of a high-dimensional time series, with each component time series stored as a row.
Regularisation parameter used in locate.change
. If no value is supplied, the dafault value is chosen to be log(log(n)*p/2), where p and n are the number of rows and columns of the data matrix x respectively.
Threshold level for testing whether an identified changepoint is a true changepoint. If no value is supplied, the threshold level is computed via Monte Carlo simulation of 100 repetitions from the null model.
The Schatten norm constraint to use in the locate.change
function. Default is schatten = 2, i.e. a Frobenius norm constraint.
The Monte Carlo parameter used for wild binary segmentation. Default is M = 0, which means a classical binary segmentation scheme is used.
How missing data in x should be handled. If missing_data='meanImpute', then missing data are imputed with row means; if 'MissInspect', use the MissInspect algorithm of Follain et al. (2022)' if 'auto', the program will make the choice depending on the amount of missingness.
whether to display progress of computation
The return value is an S3 object of class 'inspect'. It contains a list of two objeccts:
x The input data matrix
changepoints A matrix with three columns. The first column contains the locations of estimated changepoints sorted in increasing order; the second column contains the maximum CUSUM statistics of the projected univariate time series associated with each estimated changepoint; the third column contains the depth of binary segmentation for each detected changepoint.
The input time series is first standardised using the rescale.variance
function. Recursive calls of the locate.change
function then segments the multivariate time series using (wild) binary segmentation. A changepoint at time z is defined here to mean that the time series has constant mean structure for time up to and including z and constant mean structure for time from z+1 onwards.
More details about model assumption and theoretical guarantees can be found in Wang and Samworth (2016). Note that Monte Carlo computation of the threshold value can be slow, especially for large p. If inspect
is to be used multiple times with the same (or similar) data matrix size, it is better to precompute the threshold level via Monte Carlo simulation by calling the compute.threshold
function.
Wang, T. and Samworth, R. J. (2018) High dimensional changepoint estimation via sparse projection. J. Roy. Statist. Soc., Ser. B, 80, 57--83. Follain, B., Wang, T. and Samworth R. J. (2022) High-dimensional changepoint estimation with heterogeneous missingness. J. Roy. Statist. Soc., Ser. B, to appear
# NOT RUN {
n <- 500; p <- 100; ks <- 30; zs <- c(125,250,375)
varthetas <- c(0.2,0.4,0.6); overlap <- 0.5
obj <- multi.change(n, p, ks, zs, varthetas, overlap)
x <- obj$x
threshold <- compute.threshold(n,p)
ret <- inspect(x, threshold = threshold)
ret
summary(ret)
plot(ret)
# }
Run the code above in your browser using DataLab