Procedure for robust (online) extraction of low frequency components (the signal) from a univariate time series with optional rules for outlier replacement and shift detection.
robust.filter(y, width, trend = "RM", scale = "QN", outlier = "T",
shiftd = 2, wshift = floor(width/2), lbound = 0.1, p = 0.9,
adapt = 0, max.width = width,
online = FALSE, extrapolate = TRUE)
robust.filter
returns an object of class robust.filter
.
An object of class robust.filter
is a list containing the
following components:
a numeric vector containing the signal level extracted by the (regression) filter specified by trend
, scale
and outlier
.
a numeric vector containing the corresponding slope within each time window.
a numeric vector containing the corresponding scale within each time window.
an outlier indicator. 0: no outlier, +1: positive outlier, -1: negative outlier
a level shift indicator. 0: no level shift, t: positive level shift detected at processing time t, -t: negative level shift detected at processing time t (the position in the vector gives an estimate of the point in time before which the shift has occurred).
In addition, the original input time series is returned as list
member y
, and the settings used for the analysis are
returned as the list members width
, trend
,
scale
, outlier
, shiftd
,
wshift
, lbound
,
p
, adapt
, max.width
, online
and extrapolate
.
Application of the function plot
to an object of class
robust.filter
returns a plot showing the original time series
with the filtered output.
a numeric vector or (univariate) time series object.
a positive integer defining the window width used for fitting.
If online=FALSE
(default) this needs to be an odd number.
a character string defining the method to be used for robust approximation of the signal
within one time window. Possible values are:
"MED"
:Median
"RM"
:Repeated Median regression (default)
"LTS"
:Least Trimmed Squares regression
"LMS"
:Least Median of Squares regression
a character string defining the method to be used for robust estimation of the local
variability (within one time window).
Possible values are:
"MAD"
:Median absolute deviation about the median
"QN"
:Rousseeuw's and Croux' (1993) \(Q_n\) scale estimator (default)
"SN"
:Rousseeuw's and Croux' (1993) \(S_n\) scale estimator
"LSH"
:Length of the shortest half
a single character defining the rule to be used for outlier detection and outlier treatment.
Observations deviating more than \(d\cdot \hat{\sigma}_t\)
from the current level approximation \(\hat{\mu}_t\)
are replaced by \(\hat{\mu}_t\pm k\hat{\sigma}_t\)
where \(\hat{\sigma}_t\) denotes the current scale estimate.
Possible values are:
"T"
:Replace ('trim') large outliers detected by a \(3\sigma\)-rule (\(d=3\)) by the current level estimate (\(k=0\)). (default)
"L"
:Shrink large outliers (\(d=3\)) strongly towards the current level estimate (\(k=1\)).
"M"
:Shrink large and moderatly sized outliers (\(d=2\)) strongly towards the current level estimate (\(k=1\)).
"W"
:Shrink large and moderatly sized outliers (\(d=2\)) towards the current level estimate (\(k=2\)).
W
is the most efficient, T
the most robust method (which should ideally
be combined with a suitable value of lbound
).
a positive numeric value defining the factor the current scale estimate is multiplied
with for shift detection. Default is shiftd
=2
corresponding to a \(2\sigma\) rule for shift detection.
a positive integer specifying the number of the most recent observations used for shift detection
(regulates therefore also the delay of shift detection). Only used
in the online
mode; should be less than half the (minimal)
window width then. In the offline mode (online=FALSE
, default), shift
detection is based on the right half of the time window, i.e. wshift=floor(width/2)
(default).
a positive real value specifying an optional lower bound for the scale to prevent the scale estimate from reaching zero (implosion).
a fraction \(\in [2/3,1]\) of observations
for additional rules in case of only two or three different values
within one window.
If 100 percent of the observations within one window take on
only two different values, the current level is estimated by the
mean of these values regardless of the trend
specification. In case of three differing values the median is
taken as the current level estimate.
a numeric value defining the fraction which regulates the adaption of the
moving window width. adapt
can be either 0 or a value \(\in [0.6,1]\) .
adapt = 0
means that a fixed window width is used.
Otherwise, the window width is reduced whenever more than a fraction of
adapt
\(\in [0.6,1]\) of the residuals in a
certain part of the current time window are all positive or all
negative.
a positive integer (>= width
) specifying the maximal width of the time window.
width
specifies the minimal (and also the initial) width.
a logical indicating whether the current level and
scale estimates are evaluated at the most recent time
within each window (TRUE
) or centered within the window
(FALSE
). online=FALSE
(default) requires an odd
width
for the window and means a time delay of
(width
+1)/2 time units.
a logical indicating whether the level
estimations should be extrapolated to the edges of the time series.
If online=FALSE
the extrapolation consists of the
fitted values within the first half of the first window and the
last half of the last window; if online=TRUE
the
extrapolation consists of all fitted values within the first
time window.
Roland Fried and Karen Schettlinger
robust.filter
works by applying the methods
specified by trend
and scale
to a moving time
window of length width
.
Before moving the time window, it is checked whether the next
(incoming) observation is considered an 'outlier' by applying the
rule specified by outlier
. Therefore, the trend in the
current time window is extrapolated to the next point in time and
the residual of the incoming observation is standardised by the
current scale estimate.
After moving the time window, it can be tested whether a level
shift has occurred within the window: If more than half of the
residuals in the right part of the window are larger than
shiftd
\(\cdot\sigma_t\), a shift is detected and
appropriate actions are taken. In
the online
mode, the number of the rightmost residuals can be
chosen by wshift
to regulate the resistance of the detection
rule against outliers, its power and the time delay of detection.
A more detailed description of the filter can be found in Fried (2004). The adaption of the window width is described by Gather and Fried (2004). For more explanations on shift detection, see Fried and Gather (2007).
Fried, R. (2004), Robust Filtering of Time Series with
Trends, Journal of Nonparametric Statistics 16,
313-328.
(earlier version: http://hdl.handle.net/2003/4992)
Fried, R., Gather, U. (2007), On Rank Tests for Shift Detection in Time Series,
Computational Statistics and Data Analysis, Special Issue on Machine Learning and Robust Data Mining 52, 221-233.
(earlier version: http://hdl.handle.net/2003/23301)
Gather, U., Fried, R. (2004), Methods and Algorithms for Robust Filtering,
COMPSTAT 2004: Proceedings in Computational Statistics, J. Antoch (eds.), Physika-Verlag, Heidelberg, 159-170.
Schettlinger, K., Fried, R., Gather, U. (2006) Robust Filters for Intensive Care Monitoring: Beyond the Running Median, Biomedizinische Technik 51(2), 49-56.
robreg.filter
, hybrid.filter
, dw.filter
, wrm.filter
.
# Generate random time series:
y <- cumsum(runif(500)) - .5*(1:500)
# Add jumps:
y[200:500] <- y[200:500] + 5
y[400:500] <- y[400:500] - 7
# Add noise:
n <- sample(1:500, 30)
y[n] <- y[n] + rnorm(30)
# Delayed Filtering of the time series with window width 23:
y.rf <- robust.filter(y, width=23)
# Plot:
plot(y.rf)
# Delayed Filtering with different settings and fixed window width 31:
y.rf2 <- robust.filter(y, width=31, trend="LMS", scale="QN", outlier="W")
plot(y.rf2)
# Online Filtering with fixed window width 24:
y.rf3 <- robust.filter(y, width=24, online=TRUE)
plot(y.rf3)
# Delayed Filtering with adaptive window width (minimal width 11, maximal width 51):
y.rf4 <- robust.filter(y, width=11, adapt=0.7, max.width=51)
plot(y.rf4)
Run the code above in your browser using DataLab