Learn R Programming

scanstatistics (version 1.0.1)

subset_aggregation_FN_NF: Compute the most likely cluster using the FN/NF Subset Aggregation algorithm.

Description

Compute the most likely cluster (MLC) with the Subset Aggregation method by Neill et al. (2013), either through fast optimization over subsets of locations and naive optimization over subsets of streams (FN), or through naive optimization over subsets of locations and fast optimization over subsets of streams (NF).

Usage

subset_aggregation_FN_NF(args, score_fun = poisson_score,
  priority_fun = poisson_priority, algorithm = "FN")

Arguments

args

A list of arrays:

counts

Required. An array of counts (integer or numeric). First dimension is time, ordered from most recent to most distant. Second dimension indicates locations, which will be enumerated from 1 and up. Third dimension indicates data streams, which will be enumerated from 1 and up.

baselines

Required. A matrix of expected counts. Dimensions are as for counts.

penalties

Optional. A matrix of penalty terms. Dimensions are as for counts.

...

Optional. More matrices with distribution parameters. Dimensions are as for counts.

score_fun

A function taking matrix arguments, all of the same dimension, and returning a matrix or vector of that dimension. Suitable alternatives are poisson_score, gaussian_score.

priority_fun

A function taking matrix arguments, all of the same dimension, and returning a matrix or vector of that dimension. Suitable alternatives are poisson_priority, gaussian_priority.

algorithm

Either "FN" or "NF":

FN

Fast optimization over subsets of locations and naive optimization over subsets of streams. Can be used if the number of data streams is small.

NF

Fast optimization over subsets of streams and naive optimization over subsets of locations. Can be used if the number of locations is small.

Value

A list with 4 elements:

score

A scalar; the score of the MLC.

duration

An integer; the duration of the MLC, i.e. how many time periods from the present into the past the MLC stretches.

locations

An integer vector; the locations contained in the MLC.

streams

An integer vector; the data streams contained in the MLC.

References

Neill, Daniel B., Edward McFowland, and Huanian Zheng (2013). Fast subset scan for multivariate event detection. Statistics in Medicine 32 (13), pp. 2185-2208.