dw.filter: Robust Double Window Filtering Methods for Univariate Time Series

Description

Procedures for robust (online) extraction of low frequency components (the signal) from a univariate time series based on a moving window technique using two nested time windows in each step.

Usage

dw.filter(y, outer.width, inner.width, method = "all", 
             scale = "MAD", d = 2, 
             minNonNAs = 5, online = FALSE, extrapolate = TRUE)

Value

dw.filter returns an object of class dw.filter. An object of class dw.filter is a list containing the following components:

level: a data frame containing the corresponding signal level extracted by the filter(s) specified in method.
slope: a data frame containing the corresponding slope within each time window.
sigma: a data frame containing inner.loc.sigma, inner.reg.sigma, outer.loc.sigma and outer.reg.sigma, the scale estimated from the observations (loc) or the residuals from the Repeated Median regression (reg) within the inner window of length inner.width or the outer window of length outer.width, respectively.
MTM uses outer.loc.sigma for trimming outliers, MRM and TRM use outer.reg.sigma for trimming outliers,
DWMTM uses inner.loc.sigma for trimming outliers, DWMRM and DWTRM use inner.reg.sigma for trimming outliers;
MED, RM and RM require no scale estimation.
The function only returns values for inner.loc.sigma, inner.reg.sigma, outer.loc.sigma or outer.reg.sigma if any specified method requires their estimation; otherwise NAs are returned.

In addition, the original input time series is returned as list member y, and the settings used for the analysis are returned as the list members outer.width, inner.width, method, scale, d, minNonNAs, online and extrapolate.

Application of the function plot to an object of class dw.filter returns a plot showing the original time series with the filtered output.

Arguments

y

a numeric vector or (univariate) time series object.

outer.width

a positive integer specifying the window width of the outer window used for determining the final estimate.
If online=FALSE (see below) this needs to be an odd integer.

inner.width

a positive integer (not larger than outer.width) specifying the window width of the inner window used for determining the initial estimate and trimming features.
If online=FALSE (see below) this needs to be an odd integer.

method

a (vector of) character string(s) containing the method(s) to be used for the estimation of the signal level.
It is possible to specify any combination of "MED", "RM", "MTM", "TRM", "MRM", "DWRM", "DWMTM", "DWTRM", "DWMRM" and "all" (for all of the above). Default is method="all". For a detailed description see the section ‘Methods’ below.

scale

a character string specifying the method to be used for robust estimation of the local variability (within one time window). Possible values are:

"QN"

Rousseeuw's and Croux' (1993) \(Q_n\) scale estimator

"SN"

Rousseeuw's and Croux' (1993) \(S_n\) scale estimator

a positive integer defining factor the current scale estimate is multiplied with for determining the trimming boundaries for outlier detection.
Observations deviating more than \(d\cdot \hat{\sigma}_t\) from the current level approximation \(\hat{\mu}_t\) are replaced by \(\hat{\mu}_t\) where \(\hat{\sigma}_t\) denotes the current scale estimate. Default is d = 2 meaning a \(2\sigma\) rule for outlier detection.

minNonNAs

a positive integer defining the minimum number of non-missing observations within each window which is required for a ‘sensible’ estimation. Default: if windows contain less than minNonNAs = 5 observations NAs are returned.

online

a logical indicating whether the current level and scale estimates are evaluated at the most recent time within each (inner and outer) window (TRUE) or centred within the windows (FALSE). Setting online=FALSE requires odd inner.width and outer.width. Default is online=FALSE.

extrapolate

a logical indicating whether the level estimations should be extrapolated to the edges of the time series.
If online=FALSE the extrapolation consists of the fitted values within the first half of the first window and the last half of the last window; if online=TRUE the extrapolation consists of the all fitted values within the first time window.

Methods

The following methods are available as method for signal extraction, whereby the prefix DW denotes the fact that different window widths are used in the first and second step of the calculations within one window (i.e. inner.width<outer.width) while for the methods MED, RM, MTM, TRM and MRM the first and second step take place in a window of fixed length outer.width.

MED: ordinary running median filter.
The simple median is applied to the observations within a moving time window of length outer.width.
RM: ordinary repeated median filter.
Repeated median regression is applied to the observations within a moving time window of length outer.width.
MTM, DWMTM: modified trimmed mean filters.
In a first step the median is applied to (MTM): the whole window with outer.width or (DWMTM): the inner window with inner.width; in a second step the mean is applied to the (trimmed) observations in the whole window (with outer.width).
TRM, DWTRM: trimmed repeated median filters.
In a first step repeated median regression is applied to (TRM): the whole window with outer.width or (DWTRM): the inner window with inner.width; in a second step least squares regression is applied to the (trimmed) observations in the whole window (with outer.width).
MRM, DWMRM: modified repeated median filters.
In a first step repeated median regression is applied to (MRM): the whole window with outer.width or (DWMRM): the inner window with inner.width; in a second step another repeated median regression is applied to the (trimmed) observations in the whole window (with outer.width).
DWRM: double window repeated median filter.
In a first step repeated median regression is applied to the inner window with inner.width to determine the trend (slope); in a second step the median is applied to the trend corrected observations in the whole window with outer.width (without trimming).

Author

Roland Fried and Karen Schettlinger

Details

dw.filter is suitable for extracting low frequency components (the signal) from a time series which may be contaminated with outliers and can contain level shifts. For this, moving window techniques are applied.

A short inner window of length inner.width is used in each step for calculating an initial level estimate (by using either the median or a robust regression fit) and a robust estimate of the local standard deviation. Observations deviating strongly from this initial fit are trimmed from an outer time window of length outer.width, and the signal level is estimated from the remaining observations (by using either a location or regression estimator). Values specified in method determine which combination of estimation methods should be applied to the inner and outer window (see section ‘Methods’ below).

The applied method should be chosen based on an a-priori guess of the underlying signal and the data quality: Location based method (MED / MTM) are recommended in case of a locally (piecewise) constant signal, regression based approaches (RM / DWRM / TRM / MRM) in case of locally linear, monotone trends.

Since no big differences have been reported between TRM and MRM, the quicker and somewhat more efficient TRM option might be preferred. DWRM is the quickest of all regression based methods and performs better than the ordinary RM at shifts, but it is the least robust and least efficient method.

If location based methods are used, the inner.width should be chosen at least twice the length of expected patches of subsequent outliers in the time series; if regression based methods are used, the inner.width should be at least three times that length, otherwise outlier patches can influence the estimations strongly. To increase the efficiency of the final estimates, outer.width can then be chosen rather large - provided that it is smaller than the time between subsequent level shifts.

For robust scale estimation, MAD is the classical choice; SN is a somewhat more efficient and almost equally robust alternative, while QN is much more efficient if the window widths are not too small, and it performs very well at the occurrence of level shifts.

The factor d, specifying the trimming boundaries as a multiple of the estimated scale, can be chosen similarly to classical rules for detecting unusual observations in a Gaussian sample. Choosing d=3 instead of d=2 increases efficiency, but decreases robustness; d=2.5 might be seen as a compromise.

References

Bernholt, T., Fried, R., Gather, U., Wegener, I. (2006) Modified Repeated Median Filters, Statistics and Computing 16, 177-192.
(earlier version: http://hdl.handle.net/2003/5298)

Schettlinger, K., Fried, R., Gather, U. (2006) Robust Filters for Intensive Care Monitoring: Beyond the Running Median, Biomedizinische Technik 51(2), 49-56.

Examples

Run this code

if (FALSE) {
# Generate random time series:
y <- cumsum(runif(500)) - .5*(1:500)
# Add jumps:
y[200:500] <- y[200:500] + 5
y[400:500] <- y[400:500] - 7
# Add noise:
n <- sample(1:500, 30)
y[n] <- y[n] + rnorm(30)

# Filtering with all methods:
y.dw <- dw.filter(y, outer.width=31, inner.width=11, method="all")
# Plot:
plot(y.dw)

# Filtering with trimmed RM and double window TRM only:
y2.dw <- dw.filter(y, outer.width=31, inner.width=11, method=c("TRM","DWTRM"))
plot(y2.dw)
}

Run the code above in your browser using DataLab