Learn R Programming

fromo (version 0.2.4)

t_running_sd3: Compute first K moments over a sliding time-based window

Description

Compute the (standardized) 2nd through kth moments, the mean, and the number of elements over an infinite or finite sliding time based window, returning a matrix.

Usage

t_running_sd3(
  v,
  time = NULL,
  time_deltas = NULL,
  window = NULL,
  wts = NULL,
  lb_time = NULL,
  na_rm = FALSE,
  min_df = 0L,
  used_df = 1,
  restart_period = 100L,
  variable_win = FALSE,
  wts_as_delta = TRUE,
  check_wts = FALSE,
  normalize_wts = TRUE,
  check_negative_moments = TRUE
)

t_running_skew4( v, time = NULL, time_deltas = NULL, window = NULL, wts = NULL, lb_time = NULL, na_rm = FALSE, min_df = 0L, used_df = 1, restart_period = 100L, variable_win = FALSE, wts_as_delta = TRUE, check_wts = FALSE, normalize_wts = TRUE, check_negative_moments = TRUE )

t_running_kurt5( v, time = NULL, time_deltas = NULL, window = NULL, wts = NULL, lb_time = NULL, na_rm = FALSE, min_df = 0L, used_df = 1, restart_period = 100L, variable_win = FALSE, wts_as_delta = TRUE, check_wts = FALSE, normalize_wts = TRUE, check_negative_moments = TRUE )

t_running_sd( v, time = NULL, time_deltas = NULL, window = NULL, wts = NULL, lb_time = NULL, na_rm = FALSE, min_df = 0L, used_df = 1, restart_period = 100L, variable_win = FALSE, wts_as_delta = TRUE, check_wts = FALSE, normalize_wts = TRUE, check_negative_moments = TRUE )

t_running_skew( v, time = NULL, time_deltas = NULL, window = NULL, wts = NULL, lb_time = NULL, na_rm = FALSE, min_df = 0L, used_df = 1, restart_period = 100L, variable_win = FALSE, wts_as_delta = TRUE, check_wts = FALSE, normalize_wts = TRUE, check_negative_moments = TRUE )

t_running_kurt( v, time = NULL, time_deltas = NULL, window = NULL, wts = NULL, lb_time = NULL, na_rm = FALSE, min_df = 0L, used_df = 1, restart_period = 100L, variable_win = FALSE, wts_as_delta = TRUE, check_wts = FALSE, normalize_wts = TRUE, check_negative_moments = TRUE )

t_running_cent_moments( v, time = NULL, time_deltas = NULL, window = NULL, wts = NULL, lb_time = NULL, max_order = 5L, na_rm = FALSE, max_order_only = FALSE, min_df = 0L, used_df = 0, restart_period = 100L, variable_win = FALSE, wts_as_delta = TRUE, check_wts = FALSE, normalize_wts = TRUE, check_negative_moments = TRUE )

t_running_std_moments( v, time = NULL, time_deltas = NULL, window = NULL, wts = NULL, lb_time = NULL, max_order = 5L, na_rm = FALSE, min_df = 0L, used_df = 0, restart_period = 100L, variable_win = FALSE, wts_as_delta = TRUE, check_wts = FALSE, normalize_wts = TRUE, check_negative_moments = TRUE )

t_running_cumulants( v, time = NULL, time_deltas = NULL, window = NULL, wts = NULL, lb_time = NULL, max_order = 5L, na_rm = FALSE, min_df = 0L, used_df = 0, restart_period = 100L, variable_win = FALSE, wts_as_delta = TRUE, check_wts = FALSE, normalize_wts = TRUE, check_negative_moments = TRUE )

Value

Typically a matrix, where the first columns are the kth, k-1th through 2nd standardized, centered moments, then a column of the mean, then a column of the number of (non-nan) elements in the input, with the following exceptions:

t_running_cent_moments

Computes arbitrary order centered moments. When max_order_only is set, only a column of the maximum order centered moment is returned.

t_running_std_moments

Computes arbitrary order standardized moments, then the standard deviation, the mean, and the count. There is not yet an option for max_order_only, but probably should be.

t_running_cumulants

Computes arbitrary order cumulants, and returns the kth, k-1th, through the second (which is the variance) cumulant, then the mean, and the count.

Arguments

v

a vector of data.

time

an optional vector of the timestamps of v. If given, must be the same length as v. If not given, we try to infer it by summing the time_deltas.

time_deltas

an optional vector of the deltas of timestamps. If given, must be the same length as v. If not given, and wts are given and wts_as_delta is true, we take the wts as the time deltas. The deltas must be positive. We sum them to arrive at the times.

window

the window size, in time units. if given as finite integer or double, passed through. If NULL, NA_integer_, NA_real_ or Inf are given, and variable_win is true, then we infer the window from the lookback times: the first window is infinite, but the remaining is the deltas between lookback times. If variable_win is false, then these undefined values are equivalent to an infinite window. If negative, an error will be thrown.

wts

an optional vector of weights. Weights are ‘replication’ weights, meaning a value of 2 is shorthand for having two observations with the corresponding v value. If NULL, corresponds to equal unit weights, the default. Note that weights are typically only meaningfully defined up to a multiplicative constant, meaning the units of weights are immaterial, with the exception that methods which check for minimum df will, in the weighted case, check against the sum of weights. For this reason, weights less than 1 could cause NA to be returned unexpectedly due to the minimum condition. When weights are NA, the same rules for checking v are applied. That is, the observation will not contribute to the moment if the weight is NA when na_rm is true. When there is no checking, an NA value will cause the output to be NA.

lb_time

a vector of the times from which lookback will be performed. The output should be the same size as this vector. If not given, defaults to time.

na_rm

whether to remove NA, false by default.

min_df

the minimum df to return a value, otherwise NaN is returned. This can be used to prevent moments from being computed on too few observations. Defaults to zero, meaning no restriction.

used_df

the number of degrees of freedom consumed, used in the denominator of the centered moments computation. These are subtracted from the number of observations.

restart_period

the recompute period. because subtraction of elements can cause loss of precision, the computation of moments is restarted periodically based on this parameter. Larger values mean fewer restarts and faster, though less accurate results.

variable_win

if true, and the window is not a concrete number, the computation window becomes the time between lookback times.

wts_as_delta

if true and the time and time_deltas are not given, but wts are given, we take wts as the time_deltas.

check_wts

a boolean for whether the code shall check for negative weights, and throw an error when they are found. Default false for speed.

normalize_wts

a boolean for whether the weights should be renormalized to have a mean value of 1. This mean is computed over elements which contribute to the moments, so if na_rm is set, that means non-NA elements of wts that correspond to non-NA elements of the data vector.

check_negative_moments

a boolean flag. Normal computation of running moments can result in negative estimates of even order moments due to loss of numerical precision. With this flag active, the computation checks for negative even order moments and restarts the computation when one is detected. This should eliminate the possibility of negative even order moments. The downside is the speed hit of checking on every output step. Note also the code checks for negative moments of every even order tracked, even if they are not output; that is if the kurtosis, say, is being computed, and a negative variance is detected, then the computation is restarted. Defaults to TRUE to avoid negative even moments. Set to FALSE only if you know what you are doing.

max_order

the maximum order of the centered moment to be computed.

max_order_only

for running_cent_moments, if this flag is set, only compute the maximum order centered moment, and return in a vector.

Time Windowing

This function supports time (or other counter) based running computation. Here the input are the data \(x_i\), and optional weights vectors, \(w_i\), defaulting to 1, and a vector of time indices, \(t_i\) of the same length as \(x\). The times must be non-decreasing: $$t_1 \le t_2 \le \ldots$$ It is assumed that \(t_0 = -\infty\). The window, \(W\) is now a time-based window. An optional set of lookback times are also given, \(b_j\), which may have different length than the \(x\) and \(w\). The output will correspond to the lookback times, and should be the same length. The \(j\)th output is computed over indices \(i\) such that $$b_j - W < t_i \le b_j.$$

For comparison functions (like Z-score, rescaling, centering), which compare values of \(x_i\) to local moments, the lookbacks may not be given, but a lookahead \(L\) is admitted. In this case, the \(j\)th output is computed over indices \(i\) such that $$t_j - W + L < t_i \le t_j + L.$$

If the times are not given, ‘deltas’ may be given instead. If \(\delta_i\) are the deltas, then we compute the times as $$t_i = \sum_{1 \le j \le i} \delta_j.$$ The deltas must be the same length as \(x\). If times and deltas are not given, but weights are given and the ‘weights as deltas’ flag is set true, then the weights are used as the deltas.

Some times it makes sense to have the computational window be the space between lookback times. That is, the \(j\)th output is to be computed over indices \(i\) such that $$b_{j-1} - W < t_i \le b_j.$$ This can be achieved by setting the ‘variable window’ flag true and setting the window to null. This will not make much sense if the lookback times are equal to the times, since each moment computation is over a set of a single index, and most moments are underdefined.

Author

Steven E. Pav shabbychef@gmail.com

Details

Computes the number of elements, the mean, and the 2nd through kth centered (and typically standardized) moments, for \(k=2,3,4\). These are computed via the numerically robust one-pass method of Bennett et. al.

Given the length \(n\) vector \(x\), we output matrix \(M\) where \(M_{i,j}\) is the \(order - j + 1\) moment (i.e. excess kurtosis, skewness, standard deviation, mean or number of elements) of some elements \(x_i\) defined by the sliding time window. Barring NA or NaN, this is over a window of time width window.

References

Terriberry, T. "Computing Higher-Order Moments Online." https://web.archive.org/web/20140423031833/http://people.xiph.org/~tterribe/notes/homs.html

J. Bennett, et. al., "Numerically Stable, Single-Pass, Parallel Statistics Algorithms," Proceedings of IEEE International Conference on Cluster Computing, 2009. tools:::Rd_expr_doi("10.1109/CLUSTR.2009.5289161")

Cook, J. D. "Accurately computing running variance." https://www.johndcook.com/standard_deviation/

Cook, J. D. "Comparing three methods of computing standard deviation." https://www.johndcook.com/blog/2008/09/26/comparing-three-methods-of-computing-standard-deviation/

See Also

running_sd3.

Examples

Run this code
x <- rnorm(1e5)
xs3 <- t_running_sd3(x,time=seq_along(x),window=10)
xs4 <- t_running_skew4(x,time=seq_along(x),window=10)
# but what if you only cared about some middle values?
xs4 <- t_running_skew4(x,time=seq_along(x),lb_time=(length(x) / 2) + 0:10,window=20)

Run the code above in your browser using DataLab