Learn R Programming

scanstatistics (version 1.0.1)

score_priority_subset: Compute the highest scoring subset for aggregated data.

Description

Given data that is aggregated over either locations or data streams, this function finds the highest scoring subset of the remaining dimensions of the data. For example, if the data has been aggregated over data streams, the highest scoring subset will consist of a window of time stretching from the most recent time period to some time period in the past (i.e. a duration) and a collection of locations.

Usage

score_priority_subset(args, score_fun = poisson_score,
  priority_fun = poisson_priority)

Arguments

args

A list of matrices:

counts

Required. A matrix of counts. Rows indicate time, ordered from most recent to most distant. Columns indicate e.g. locations or data streams, enumerated from 1 and up.

baselines

Required. A matrix of expected counts. Dimensions are as for counts.

penalties

Optional. A matrix of penalty terms. Dimensions are as for counts.

...

Optional. More matrices with parameters

score_fun

A function taking matrix arguments, all of the same dimension, and returning a matrix or vector of that dimension. Suitable alternatives are poisson_score, gaussian_score.

priority_fun

A function taking matrix arguments, all of the same dimension, and returning a matrix or vector of that dimension. Suitable alternatives are poisson_priority, gaussian_priority.

Value

A list containing three elements:

score

The highest score of all clusters.

duration

The duration of the score-maximizing cluster.

subset

An integer vector of the subset of e.g. locations or data streams in the score-maximizing cluster.

Details

This function provides the main component of the FN and NF algorithms described in Section 3.1 of Neill et al. (2013).

References

Neill, Daniel B., Edward McFowland, and Huanian Zheng (2013). Fast subset scan for multivariate event detection. Statistics in Medicine 32 (13), pp. 2185-2208.