Given data that is aggregated over either locations or data streams, this function finds the highest scoring subset of the remaining dimensions of the data. For example, if the data has been aggregated over data streams, the highest scoring subset will consist of a window of time stretching from the most recent time period to some time period in the past (i.e. a duration) and a collection of locations.
score_priority_subset(args, score_fun = poisson_score,
priority_fun = poisson_priority)
A list of matrices:
Required. A matrix of counts. Rows indicate time, ordered from most recent to most distant. Columns indicate e.g. locations or data streams, enumerated from 1 and up.
Required. A matrix of expected counts. Dimensions are
as for counts
.
Optional. A matrix of penalty terms. Dimensions are as
for counts
.
Optional. More matrices with parameters
A function taking matrix arguments, all of the
same dimension, and returning a matrix or vector of that dimension.
Suitable alternatives are poisson_score
,
gaussian_score
.
A function taking matrix arguments, all of the
same dimension, and returning a matrix or vector of that dimension.
Suitable alternatives are poisson_priority
,
gaussian_priority
.
A list containing three elements:
The highest score of all clusters.
The duration of the score-maximizing cluster.
An integer vector of the subset of e.g. locations or data streams in the score-maximizing cluster.
This function provides the main component of the FN and NF algorithms described in Section 3.1 of Neill et al. (2013).
Neill, Daniel B., Edward McFowland, and Huanian Zheng (2013). Fast subset scan for multivariate event detection. Statistics in Medicine 32 (13), pp. 2185-2208.