Learn R Programming

AnomalyDetection (version 1.0)

AnomalyDetectionVec: Anomaly Detection Using Seasonal Hybrid ESD Test

Description

A technique for detecting anomalies in seasonal univariate time series where the input is a series of observations.

Usage

AnomalyDetectionVec(x, max_anoms = 0.1, direction = "pos", alpha = 0.05,
  period = NULL, only_last = F, threshold = "None", e_value = F,
  longterm_period = NULL, plot = F, y_log = F, xlabel = "",
  ylabel = "count", title = NULL)

Arguments

x

Time series as a column data frame, list, or vector, where the column consists of the observations.

max_anoms

Maximum number of anomalies that S-H-ESD will detect as a percentage of the data.

direction

Directionality of the anomalies to be detected. Options are: 'pos' | 'neg' | 'both'.

alpha

The level of statistical significance with which to accept or reject anomalies.

period

Defines the number of observations in a single period, and used during seasonal decomposition.

only_last

Find and report anomalies only within the last period in the time series.

threshold

Only report positive going anoms above the threshold specified. Options are: 'None' | 'med_max' | 'p95' | 'p99'.

e_value

Add an additional column to the anoms output containing the expected value.

longterm_period

Defines the number of observations for which the trend can be considered flat. The value should be an integer multiple of the number of observations in a single period. This increases anom detection efficacy for time series that are greater than a month.

plot

A flag indicating if a plot with both the time series and the estimated anoms, indicated by circles, should also be returned.

y_log

Apply log scaling to the y-axis. This helps with viewing plots that have extremely large positive anomalies relative to the rest of the data.

xlabel

X-axis label to be added to the output plot.

ylabel

Y-axis label to be added to the output plot.

title

Title for the output plot.

Value

The returned value is a list with the following components.

anoms

Data frame containing index, values, and optionally expected values.

plot

A graphical object if plotting was requested by the user. The plot contains the estimated anomalies annotated on the input time series.

One can save anoms to a file in the following fashion: write.csv(<return list name>[["anoms"]], file=<filename>)

One can save plot to a file in the following fashion: ggsave(<filename>, plot=<return list name>[["plot"]])

Details

longterm_period This option should be set when the input time series is longer than a month. The option enables the approach described in Vallis, Hochenbaum, and Kejariwal (2014). threshold Filter all negative anomalies and those anomalies whose magnitude is smaller than one of the specified thresholds which include: the median of the daily max values (med_max), the 95th percentile of the daily max values (p95), and the 99th percentile of the daily max values (p99).

References

Vallis, O., Hochenbaum, J. and Kejariwal, A., (2014) "A Novel Technique for Long-Term Anomaly Detection in the Cloud", 6th USENIX, Philadelphia, PA.

Rosner, B., (May 1983), "Percentage Points for a Generalized ESD Many-Outlier Procedure" , Technometrics, 25(2), pp. 165-172.

See Also

AnomalyDetectionTs

Examples

Run this code
# NOT RUN {
data(raw_data)
AnomalyDetectionVec(raw_data[,2], max_anoms=0.02, period=1440, direction='both', plot=TRUE)
# To detect only the anomalies in the last period, run the following:
AnomalyDetectionVec(raw_data[,2], max_anoms=0.02, period=1440, direction='both',
only_last=TRUE, plot=TRUE)
# }

Run the code above in your browser using DataLab