hankel: Perform Singular Value Decomposition of Block-Hankel Matrix

Description

This function constructs a block-Hankel matrix based on time-course data, performs the subsequent singular value decomposition (SVD) on this matrix, and returns the number of large singular values as defined by a user-supplied cutoff criterion.

Usage

hankel(y, lag, cutoff, type)

Value

svs: Vector of singular values of the block-Hankel matrix \(H\)
dim: Number of large singular values, as determined by the user-supplied cutoff

Arguments

y: A list of R (PxT) matrices of observed time course profiles
lag: Maximum relevant time lag to be used in constructing the block-Hankel matrix
cutoff: Cutoff to be used, determined by desired percent of total variance explained
type: Method to combine results across replicates ("median" or "mean")

Author

Andrea Rau

Details

Constructs the block-Hankel matrix \(H\) of autocovariances of time series observations is constructed (see references for additional information), where the maximum relevant time lag must be specified as lag. In the context of gene networks, this corresponds to the maximum relevant biological time lag between a gene and its regulators. This quantity is experiment-specific, but will generally be small for gene expression studies (on the order of 1, 2, or 3).

The singular value decomposition of \(H\) is performed, and the singular values are ordered by size and scaled by the largest singular value. Note that if there are T time points in the data, only the first (T - 1) singular values will be non-zero.

To choose the number of large singular values, we wish to find the point at which the inclusion of an additional singular value does not increase the amount of explained variation enough to justify its inclusion (similar to choosing the number of components in a Principal Components Analysis). The user-supplied value of cutoff gives the desired percent of variance explained by the first set of K principal components. The algorithm returns the value of K, which may subsequently be used as the dimension of the hidden state in ebdbn.

The argument 'type' takes the value of "median" or "mean", and is used to determine how results from replicated experiments are combined (i.e., median or mean of the per-replicate final hidden state dimension).

References

Masanao Aoki and Arthur Havenner (1991). State space modeling of multiple time series. Econometric Reviews 10(1), 1-59.

Martina Bremer (2006). Identifying regulated genes through the correlation structure of time dependent microarray data. Ph. D. thesis, Purdue University.

Andrea Rau, Florence Jaffrezic, Jean-Louis Foulley, and R. W. Doerge (2010). An Empirical Bayesian Method for Estimating Biological Networks from Temporal Microarray Data. Statistical Applications in Genetics and Molecular Biology 9. Article 9.

Examples

Run this code

library(ebdbNet)
tmp <- runif(1) ## Initialize random number generator
set.seed(125214) ## Save seed

## Simulate data
y <- simulateVAR(R = 5, T = 10, P = 10, v = rep(10, 10), perc = 0.10)$y

## Determine the number of hidden states to be estimated (with lag = 1)
K <- hankel(y, lag = 1, cutoff = 0.90, type = "median")$dim
## K = 5

Run the code above in your browser using DataLab