subset_aggregation_FF: Fast Subset Aggregation over both locations and data streams.

Description

Compute the most likely cluster (MLC) with the Subset Aggregation method by Neill et al. (2013) through fast optimization over subsets of locations and subsets of streams.

Usage

subset_aggregation_FF(args, score_fun = poisson_score,
  priority_fun = poisson_priority, R = 50, rel_tol = 0.01)

Arguments

args

A list of matrices:

counts: Required. A matrix of counts. Rows indicate time, ordered from most recent to most distant. Columns indicate e.g. locations or data streams, enumerated from 1 and up.
baselines: Required. A matrix of expected counts. Dimensions are as for counts.
penalties: Optional. A matrix of penalty terms. Dimensions are as for counts.
...: Optional. More matrices with parameters

score_fun

A function taking matrix arguments, all of the same dimension, and returning a matrix or vector of that dimension. Suitable alternatives are poisson_score, gaussian_score.

priority_fun

A function taking matrix arguments, all of the same dimension, and returning a matrix or vector of that dimension. Suitable alternatives are poisson_priority, gaussian_priority.

The number of random restarts.

rel_tol

The relative tolerance criterion. If the current score divided by the previous score, minus one, is less than this number then the algorithm is deemed to have converged.

Value

A list containing the most likely cluster (MLC), having the following elements:

score: A scalar; the score of the MLC.
duration: An integer; the duration of the MLC, i.e. how many time periods from the present into the past the MLC stretches.
locations: An integer vector; the locations contained in the MLC.
streams: An integer vector; the data streams contained in the MLC.
random_restarts: The number of random restarts performed.
iter_to_conv: The number of iterations it took to reach convergence for each random restart.

Details

Note: algorithm not quite as in Neill et al. (2013) since the randomly chosen subset of streams is the same for all time windows.

References

Neill, Daniel B., Edward McFowland, and Huanian Zheng (2013). Fast subset scan for multivariate event detection. Statistics in Medicine 32 (13), pp. 2185-2208.

Examples

Run this code

# NOT RUN {
# Set simulation parameters (small)
set.seed(1)
n_loc <- 20
n_dur <- 10
n_streams <- 2
n_tot <- n_loc * n_dur * n_streams

# Generate baselines and possibly other distribution parameters
baselines <- rexp(n_tot, 1/5) + rexp(n_tot, 1/5)
sigma2s <- rexp(n_tot)

# Generate counts
counts <- rpois(n_tot, baselines)

# Reshape into arrays
counts <- array(counts, c(n_dur, n_loc, n_streams))
baselines <- array(baselines, c(n_dur, n_loc, n_streams))
sigma2s <- array(sigma2s, c(n_dur, n_loc, n_streams))

# Inject an outbreak/event
ob_loc <- 1:floor(n_loc / 4)
ob_dur <- 1:floor(n_dur / 4)
ob_streams <- 1:floor(n_streams / 2)
counts[ob_dur, ob_loc, ob_streams] <- 4 * counts[ob_dur, ob_loc, ob_streams]

# Run the FN algorithm
FF_res <- subset_aggregation_FF(
  list(counts = counts, baselines = baselines),
  score_fun = poisson_score,
  priority_fun = poisson_priority,
  algorithm = "FN")
# }

Run the code above in your browser using DataLab