score_priority_subset: Compute the highest scoring subset for aggregated data.

Description

Given data that is aggregated over either locations or data streams, this function finds the highest scoring subset of the remaining dimensions of the data. For example, if the data has been aggregated over data streams, the highest scoring subset will consist of a window of time stretching from the most recent time period to some time period in the past (i.e. a duration) and a collection of locations.

Usage

score_priority_subset(args, score_fun = poisson_score,
  priority_fun = poisson_priority)

Arguments

args

A list of matrices:

counts: Required. A matrix of counts. Rows indicate time, ordered from most recent to most distant. Columns indicate e.g. locations or data streams, enumerated from 1 and up.
baselines: Required. A matrix of expected counts. Dimensions are as for counts.
penalties: Optional. A matrix of penalty terms. Dimensions are as for counts.
...: Optional. More matrices with parameters

score_fun

A function taking matrix arguments, all of the same dimension, and returning a matrix or vector of that dimension. Suitable alternatives are poisson_score, gaussian_score.

priority_fun

A function taking matrix arguments, all of the same dimension, and returning a matrix or vector of that dimension. Suitable alternatives are poisson_priority, gaussian_priority.

Value

A list containing three elements:

score: The highest score of all clusters.
duration: The duration of the score-maximizing cluster.
subset: An integer vector of the subset of e.g. locations or data streams in the score-maximizing cluster.

Details

This function provides the main component of the FN and NF algorithms described in Section 3.1 of Neill et al. (2013).

References

Neill, Daniel B., Edward McFowland, and Huanian Zheng (2013). Fast subset scan for multivariate event detection. Statistics in Medicine 32 (13), pp. 2185-2208.