running_centered: Compare data to moments computed over a sliding window.

Description

Computes moments over a sliding window, then adjusts the data accordingly, centering, or scaling, or z-scoring, and so on.

Usage

running_centered(
  v,
  window = NULL,
  wts = NULL,
  na_rm = FALSE,
  min_df = 0L,
  used_df = 1,
  lookahead = 0L,
  restart_period = 100L,
  check_wts = FALSE,
  normalize_wts = FALSE,
  check_negative_moments = TRUE
)
running_scaled(
  v,
  window = NULL,
  wts = NULL,
  na_rm = FALSE,
  min_df = 0L,
  used_df = 1,
  lookahead = 0L,
  restart_period = 100L,
  check_wts = FALSE,
  normalize_wts = TRUE,
  check_negative_moments = TRUE
)
running_zscored(
  v,
  window = NULL,
  wts = NULL,
  na_rm = FALSE,
  min_df = 0L,
  used_df = 1,
  lookahead = 0L,
  restart_period = 100L,
  check_wts = FALSE,
  normalize_wts = TRUE,
  check_negative_moments = TRUE
)
running_sharpe(
  v,
  window = NULL,
  wts = NULL,
  na_rm = FALSE,
  compute_se = FALSE,
  min_df = 0L,
  used_df = 1,
  restart_period = 100L,
  check_wts = FALSE,
  normalize_wts = TRUE,
  check_negative_moments = TRUE
)
running_tstat(
  v,
  window = NULL,
  wts = NULL,
  na_rm = FALSE,
  min_df = 0L,
  used_df = 1,
  restart_period = 100L,
  check_wts = FALSE,
  normalize_wts = TRUE,
  check_negative_moments = TRUE
)

Value

a vector the same size as the input consisting of the adjusted version of the input. When there are not sufficient (non-nan) elements for the computation, NaN are returned.

Arguments

v: a vector
window: the window size. if given as finite integer or double, passed through. If NULL, NA_integer_, NA_real_ or Inf are given, equivalent to an infinite window size. If negative, an error will be thrown.
wts: an optional vector of weights. Weights are ‘replication’ weights, meaning a value of 2 is shorthand for having two observations with the corresponding v value. If NULL, corresponds to equal unit weights, the default. Note that weights are typically only meaningfully defined up to a multiplicative constant, meaning the units of weights are immaterial, with the exception that methods which check for minimum df will, in the weighted case, check against the sum of weights. For this reason, weights less than 1 could cause NA to be returned unexpectedly due to the minimum condition. When weights are NA, the same rules for checking v are applied. That is, the observation will not contribute to the moment if the weight is NA when na_rm is true. When there is no checking, an NA value will cause the output to be NA.
na_rm: whether to remove NA, false by default.
min_df: the minimum df to return a value, otherwise NaN is returned. This can be used to prevent e.g. Z-scores from being computed on only 3 observations. Defaults to zero, meaning no restriction, which can result in infinite Z-scores during the burn-in period.
used_df: the number of degrees of freedom consumed, used in the denominator of the centered moments computation. These are subtracted from the number of observations.
lookahead: for some of the operations, the value is compared to mean and standard deviation possibly using 'future' or 'past' information by means of a non-zero lookahead. Positive values mean data are taken from the future.
restart_period: the recompute period. because subtraction of elements can cause loss of precision, the computation of moments is restarted periodically based on this parameter. Larger values mean fewer restarts and faster, though less accurate results.
check_wts: a boolean for whether the code shall check for negative weights, and throw an error when they are found. Default false for speed.
normalize_wts: a boolean for whether the weights should be renormalized to have a mean value of 1. This mean is computed over elements which contribute to the moments, so if na_rm is set, that means non-NA elements of wts that correspond to non-NA elements of the data vector.
check_negative_moments: a boolean flag. Normal computation of running moments can result in negative estimates of even order moments due to loss of numerical precision. With this flag active, the computation checks for negative even order moments and restarts the computation when one is detected. This should eliminate the possibility of negative even order moments. The downside is the speed hit of checking on every output step. Note also the code checks for negative moments of every even order tracked, even if they are not output; that is if the kurtosis, say, is being computed, and a negative variance is detected, then the computation is restarted. Defaults to TRUE to avoid negative even moments. Set to FALSE only if you know what you are doing.
compute_se: for running_sharpe, return an extra column of the standard error, as computed by Mertens' correction.

Author

Steven E. Pav shabbychef@gmail.com

Details

Given the length \(n\) vector \(x\), for a given index \(i\), define \(x^{(i)}\) as the vector of \(x_{i-window+1},x_{i-window+2},...,x_{i}\), where we do not run over the 'edge' of the vector. In code, this is essentially x[(max(1,i-window+1)):i]. Then define \(\mu_i\), \(\sigma_i\) and \(n_i\) as, respectively, the sample mean, standard deviation and number of non-NA elements in \(x^{(i)}\).

We compute output vector \(m\) the same size as \(x\). For the 'centered' version of \(x\), we have \(m_i = x_i - \mu_i\). For the 'scaled' version of \(x\), we have \(m_i = x_i / \sigma_i\). For the 'z-scored' version of \(x\), we have \(m_i = (x_i - \mu_i) / \sigma_i\). For the 't-scored' version of \(x\), we have \(m_i = \sqrt{n_i} \mu_i / \sigma_i\).

We also allow a 'lookahead' for some of these operations. If positive, the moments are computed using data from larger indices; if negative, from smaller indices. Letting \(j = i + lookahead\): For the 'centered' version of \(x\), we have \(m_i = x_i - \mu_j\). For the 'scaled' version of \(x\), we have \(m_i = x_i / \sigma_j\). For the 'z-scored' version of \(x\), we have \(m_i = (x_i - \mu_j) / \sigma_j\).

References

Terriberry, T. "Computing Higher-Order Moments Online." https://web.archive.org/web/20140423031833/http://people.xiph.org/~tterribe/notes/homs.html

J. Bennett, et. al., "Numerically Stable, Single-Pass, Parallel Statistics Algorithms," Proceedings of IEEE International Conference on Cluster Computing, 2009. tools:::Rd_expr_doi("10.1109/CLUSTR.2009.5289161")

Cook, J. D. "Accurately computing running variance." https://www.johndcook.com/standard_deviation/

Cook, J. D. "Comparing three methods of computing standard deviation." https://www.johndcook.com/blog/2008/09/26/comparing-three-methods-of-computing-standard-deviation/

Examples

Run this code


if (require(moments)) {
    set.seed(123)
    x <- rnorm(5e1)
    window <- 10L
    rm1 <- t(sapply(seq_len(length(x)),function(iii) { 
                  xrang <- x[max(1,iii-window+1):iii]
                  c(sd(xrang),mean(xrang),length(xrang)) },
                  simplify=TRUE))
    rcent <- running_centered(x,window=window)
    rscal <- running_scaled(x,window=window)
    rzsco <- running_zscored(x,window=window)
    rshrp <- running_sharpe(x,window=window)
    rtsco <- running_tstat(x,window=window)
    rsrse <- running_sharpe(x,window=window,compute_se=TRUE)
    stopifnot(max(abs(rcent - (x - rm1[,2])),na.rm=TRUE) < 1e-12)
    stopifnot(max(abs(rscal - (x / rm1[,1])),na.rm=TRUE) < 1e-12)
    stopifnot(max(abs(rzsco - ((x - rm1[,2]) / rm1[,1])),na.rm=TRUE) < 1e-12)
    stopifnot(max(abs(rshrp - (rm1[,2] / rm1[,1])),na.rm=TRUE) < 1e-12)
    stopifnot(max(abs(rtsco - ((sqrt(rm1[,3]) * rm1[,2]) / rm1[,1])),na.rm=TRUE) < 1e-12)
    stopifnot(max(abs(rsrse[,1] - rshrp),na.rm=TRUE) < 1e-12)

    rm2 <- t(sapply(seq_len(length(x)),function(iii) { 
                  xrang <- x[max(1,iii-window+1):iii]
                  c(kurtosis(xrang)-3.0,skewness(xrang)) },
                  simplify=TRUE))
    mertens_se <- sqrt((1 + ((2 + rm2[,1])/4) * rshrp^2 - rm2[,2]*rshrp) / rm1[,3])
    stopifnot(max(abs(rsrse[,2] - mertens_se),na.rm=TRUE) < 1e-12)
}

Run the code above in your browser using DataLab