cp_mean: Mean-Shift Changepoint

Description

Test on device-events using the mean-shift changepoint method originally described in Xu, et al 2015.

Usage

cp_mean(df, ...)
# S3 method for mds_ts
cp_mean(df, ts_event = c(Count = "nA"), analysis_of = NA, ...)
# S3 method for default
cp_mean(
  df,
  analysis_of = NA,
  eval_period = NULL,
  alpha = 0.05,
  cp_max = 100,
  min_seglen = 6,
  epochs = NULL,
  bootstrap_iter = 1000,
  replace = T,
  zero_rate = 1/3,
  ...
)

Arguments

Required input data frame of class mds_ts or, for generic usage, any data frame with the following columns:

time: Unique times of class Date
event: Either the event count or rate of class numeric

...

Further arguments passed onto cp_mean methods

ts_event

Required if df is of class mds_ts. Named string indicating the variable corresponding to the event count or rate. Rate must be calculated in a separate column in df as it is not calculated by default. The name of the string is an English description of what was analyzed.

Default: c("Count"="nA") corresponding to the event count column in mds_ts objects. Name is generated from mds_ts metadata.

Example: c("Rate of Bone Filler Events in Canada"="rate")

analysis_of

Optional string indicating the English description of what was analyzed. If specified, this will override the name of the ts_event string parameter.

Default: NA indicates no English description for plain df data frames, or ts_event English description for df data frames of class mds_ts.

Example: "Rate of bone cement leakage"

eval_period

Optional positive integer indicating the number of unique times counting in reverse chronological order to assess. This will be used to establish the process mean and moving range.

Default: NULL considers all times in df.

alpha

Alpha or Type-I error rate for detection of a changepoint, in the range (0, 1).

Default: 0.05 detects a changepoint at an alpha level of 0.05 or 5%.

cp_max

Maximum number of changepoints detectable. This supersedes the theoretical max set by epochs.

Default: 100 detects up to a maximum of 100 changepoints.

min_seglen

Minimum required length of consecutive measurements without a changepoint in order to test for an additional changepoint within.

Default: 6 requires a minimum of 6 consecutive measurements.

epochs

Maximum number of epochs allowed in the iterative search for changepoints, where 2^epochs is the theoretical max changepoints findable. Within each epoch, all measurement segments with a minimum of min_seglen measurements are tested for a changepoint until no additional changepoints are found.

Default: NULL estimates max epochs from the number of observations or measurements in df and min_seglen.

bootstrap_iter

Number of bootstrap iterations for constructing the null distribution of means. Lowest recommended is 1000. Increasing iterations also increases p-value precision.

Default: 1000 uses 1000 bootstrap iterations.

replace

When sampling for the bootstrap, perform sampling with or without replacement. Unless your df contains many measurements, and definitely more than bootstrap_iter, it makes the most sense to set this to TRUE.

Default: T constructs bootstrap samples with replacement.

zero_rate

Required maximum proportion of events in df (constrained by eval_period) containing zeroes for this algorithm to run. Because mean-shift changepoint does not perform well on time series with many 0 values, a value >0 is recommended.

Default: 1/3 requires no more than 1/3 zeros in events in df in order to run.

Value

A named list of class mdsstat_test object, as follows:

test_name: Name of the test run
analysis_of: English description of what was analyzed
status: Named boolean of whether the test was run. The name contains the run status.
result: A standardized list of test run results: statistic for the test statistic, lcl and ucl for the 95 confidence bounds, p for the p-value, signal status, and signal_threshold.
params: The test parameters
data: The data on which the test was run

Methods (by class)

mds_ts: Mean-shift changepoint on mds_ts data
default: Mean-shift changepoint on general data

Details

Function cp_mean() is an implementation of the mean-shift changepoint method originally proposed by Xu, et al (2015) based on testing the mean-centered absolute cumulative sum against a bootstrap null distribution. This algorithm defines a signal as any changepoint found within the last/most recent n=min_seglen measurements of df.

The parameters in this implementation can be interpreted as follows. Changepoints are detected at an alpha level based on n=bootstrap_iter bootstrap iterations (with or without replacement using replace) of the input time series df. A minimum of n=min_seglen consecutive measurements without a changepoint are required to test for an additional changepoint. Both epochs and cp_max constrain the maximum possible number of changepoints detectable as follows: within each epoch, each segment of consecutive measurements at least n=min_seglen measurements long are tested for a changepoint, until no additional changepoints are found.

References

Xu, Zhiheng, et al. "Signal detection using change point analysis in postmarket surveillance." Pharmacoepidemiology and Drug Safety 24.6 (2015): 663-668.

Examples

Run this code

# NOT RUN {
# Basic Example
data <- data.frame(time=c(1:25), event=as.integer(stats::rnorm(25, 100, 25)))
a1 <- cp_mean(data)
# Example using an mds_ts object
a2 <- cp_mean(mds_ts[[3]])
# Example using a derived rate as the "event"
data <- mds_ts[[3]]
data$rate <- ifelse(is.na(data$nA), 0, data$nA) / data$exposure
a3 <- cp_mean(data, c(Rate="rate"))

# }

Run the code above in your browser using DataLab