A technique for detecting anomalies in seasonal univariate time series where the input is a series of <timestamp, count> pairs.
AnomalyDetectionTs(x, max_anoms = 0.1, direction = "pos", alpha = 0.05,
only_last = NULL, threshold = "None", e_value = F, longterm = F,
plot = F, y_log = F, xlabel = "", ylabel = "count", title = NULL)
Time series as a two column data frame where the first column consists of the timestamps and the second column consists of the observations.
Maximum number of anomalies that S-H-ESD will detect as a percentage of the data.
Directionality of the anomalies to be detected. Options are:
'pos' | 'neg' | 'both'
.
The level of statistical significance with which to accept or reject anomalies.
Find and report anomalies only within the last day or hr in the time series.
NULL | 'day' | 'hr'
.
Only report positive going anoms above the threshold specified. Options are:
'None' | 'med_max' | 'p95' | 'p99'
.
Add an additional column to the anoms output containing the expected value.
Increase anom detection efficacy for time series that are greater than a month. See Details below.
A flag indicating if a plot with both the time series and the estimated anoms, indicated by circles, should also be returned.
Apply log scaling to the y-axis. This helps with viewing plots that have extremely large positive anomalies relative to the rest of the data.
X-axis label to be added to the output plot.
Y-axis label to be added to the output plot.
Title for the output plot.
The returned value is a list with the following components.
Data frame containing timestamps, values, and optionally expected values.
A graphical object if plotting was requested by the user. The plot contains the estimated anomalies annotated on the input time series.
One can save anoms to a file in the following fashion: write.csv(<return list name>[["anoms"]], file=<filename>)
One can save plot to a file in the following fashion: ggsave(<filename>, plot=<return list name>[["plot"]])
longterm
This option should be set when the input time series is longer than a month.
The option enables the approach described in Vallis, Hochenbaum, and Kejariwal (2014).
threshold
Filter all negative anomalies and those anomalies whose magnitude is smaller
than one of the specified thresholds which include: the median
of the daily max values (med_max), the 95th percentile of the daily max values (p95), and the
99th percentile of the daily max values (p99).
Vallis, O., Hochenbaum, J. and Kejariwal, A., (2014) "A Novel Technique for Long-Term Anomaly Detection in the Cloud", 6th USENIX, Philadelphia, PA.
Rosner, B., (May 1983), "Percentage Points for a Generalized ESD Many-Outlier Procedure" , Technometrics, 25(2), pp. 165-172.
# NOT RUN {
data(raw_data)
AnomalyDetectionTs(raw_data, max_anoms=0.02, direction='both', plot=TRUE)
# To detect only the anomalies on the last day, run the following:
AnomalyDetectionTs(raw_data, max_anoms=0.02, direction='both', only_last="day", plot=TRUE)
# }
Run the code above in your browser using DataLab