Compute the (standardized) 2nd through kth moments, the mean, and the number of elements over an infinite or finite sliding window, returning a matrix.
running_sd3(
v,
window = NULL,
wts = NULL,
na_rm = FALSE,
min_df = 0L,
used_df = 1,
restart_period = 100L,
check_wts = FALSE,
normalize_wts = TRUE,
check_negative_moments = TRUE
)running_skew4(
v,
window = NULL,
wts = NULL,
na_rm = FALSE,
min_df = 0L,
used_df = 1,
restart_period = 100L,
check_wts = FALSE,
normalize_wts = TRUE,
check_negative_moments = TRUE
)
running_kurt5(
v,
window = NULL,
wts = NULL,
na_rm = FALSE,
min_df = 0L,
used_df = 1,
restart_period = 100L,
check_wts = FALSE,
normalize_wts = TRUE,
check_negative_moments = TRUE
)
running_sd(
v,
window = NULL,
wts = NULL,
na_rm = FALSE,
min_df = 0L,
used_df = 1,
restart_period = 100L,
check_wts = FALSE,
normalize_wts = TRUE,
check_negative_moments = TRUE
)
running_skew(
v,
window = NULL,
wts = NULL,
na_rm = FALSE,
min_df = 0L,
used_df = 1,
restart_period = 100L,
check_wts = FALSE,
normalize_wts = TRUE,
check_negative_moments = TRUE
)
running_kurt(
v,
window = NULL,
wts = NULL,
na_rm = FALSE,
min_df = 0L,
used_df = 1,
restart_period = 100L,
check_wts = FALSE,
normalize_wts = TRUE,
check_negative_moments = TRUE
)
running_cent_moments(
v,
window = NULL,
wts = NULL,
max_order = 5L,
na_rm = FALSE,
max_order_only = FALSE,
min_df = 0L,
used_df = 0,
restart_period = 100L,
check_wts = FALSE,
normalize_wts = TRUE,
check_negative_moments = TRUE
)
running_std_moments(
v,
window = NULL,
wts = NULL,
max_order = 5L,
na_rm = FALSE,
min_df = 0L,
used_df = 0,
restart_period = 100L,
check_wts = FALSE,
normalize_wts = TRUE,
check_negative_moments = TRUE
)
running_cumulants(
v,
window = NULL,
wts = NULL,
max_order = 5L,
na_rm = FALSE,
min_df = 0L,
used_df = 0,
restart_period = 100L,
check_wts = FALSE,
normalize_wts = TRUE,
check_negative_moments = TRUE
)
Typically a matrix, where the first columns are the kth, k-1th through 2nd standardized, centered moments, then a column of the mean, then a column of the number of (non-nan) elements in the input, with the following exceptions:
Computes arbitrary order centered moments. When max_order_only
is set,
only a column of the maximum order centered moment is returned.
Computes arbitrary order standardized moments, then the standard deviation, the mean,
and the count. There is not yet an option for max_order_only
, but probably should be.
Computes arbitrary order cumulants, and returns the kth, k-1th, through the second (which is the variance) cumulant, then the mean, and the count.
a vector
the window size. if given as finite integer or double, passed through.
If NULL
, NA_integer_
, NA_real_
or Inf
are given, equivalent
to an infinite window size. If negative, an error will be thrown.
an optional vector of weights. Weights are ‘replication’
weights, meaning a value of 2 is shorthand for having two observations
with the corresponding v
value. If NULL
, corresponds to
equal unit weights, the default. Note that weights are typically only meaningfully defined
up to a multiplicative constant, meaning the units of weights are
immaterial, with the exception that methods which check for minimum df will,
in the weighted case, check against the sum of weights. For this reason,
weights less than 1 could cause NA
to be returned unexpectedly due
to the minimum condition. When weights are NA
, the same rules for checking v
are applied. That is, the observation will not contribute to the moment
if the weight is NA
when na_rm
is true. When there is no
checking, an NA
value will cause the output to be NA
.
whether to remove NA, false by default.
the minimum df to return a value, otherwise NaN
is returned.
This can be used to prevent moments from being computed on too few observations.
Defaults to zero, meaning no restriction.
the number of degrees of freedom consumed, used in the denominator of the centered moments computation. These are subtracted from the number of observations.
the recompute period. because subtraction of elements can cause loss of precision, the computation of moments is restarted periodically based on this parameter. Larger values mean fewer restarts and faster, though less accurate results.
a boolean for whether the code shall check for negative weights, and throw an error when they are found. Default false for speed.
a boolean for whether the weights should be
renormalized to have a mean value of 1. This mean is computed over elements
which contribute to the moments, so if na_rm
is set, that means non-NA
elements of wts
that correspond to non-NA elements of the data
vector.
a boolean flag. Normal computation of running
moments can result in negative estimates of even order moments due to loss of
numerical precision. With this flag active, the computation checks for negative
even order moments and restarts the computation when one is detected. This
should eliminate the possibility of negative even order moments. The
downside is the speed hit of checking on every output step. Note also the
code checks for negative moments of every even order tracked, even if they
are not output; that is if the kurtosis, say, is being computed, and a
negative variance is detected, then the computation is restarted.
Defaults to TRUE
to avoid negative even moments. Set to FALSE
only if you know what you are doing.
the maximum order of the centered moment to be computed.
for running_cent_moments
, if this flag is set, only compute
the maximum order centered moment, and return in a vector.
Steven E. Pav shabbychef@gmail.com
Computes the number of elements, the mean, and the 2nd through kth centered (and typically standardized) moments, for \(k=2,3,4\). These are computed via the numerically robust one-pass method of Bennett et. al.
Given the length \(n\) vector \(x\), we output matrix \(M\) where
\(M_{i,j}\) is the \(order - j + 1\) moment (i.e.
excess kurtosis, skewness, standard deviation, mean or number of elements)
of \(x_{i-window+1},x_{i-window+2},...,x_{i}\).
Barring NA
or NaN
, this is over a window of size window
.
During the 'burn-in' phase, we take fewer elements.
Terriberry, T. "Computing Higher-Order Moments Online." https://web.archive.org/web/20140423031833/http://people.xiph.org/~tterribe/notes/homs.html
J. Bennett, et. al., "Numerically Stable, Single-Pass, Parallel Statistics Algorithms," Proceedings of IEEE International Conference on Cluster Computing, 2009. tools:::Rd_expr_doi("10.1109/CLUSTR.2009.5289161")
Cook, J. D. "Accurately computing running variance." https://www.johndcook.com/standard_deviation/
Cook, J. D. "Comparing three methods of computing standard deviation." https://www.johndcook.com/blog/2008/09/26/comparing-three-methods-of-computing-standard-deviation/
x <- rnorm(1e5)
xs3 <- running_sd3(x,10)
xs4 <- running_skew4(x,10)
if (require(moments)) {
set.seed(123)
x <- rnorm(5e1)
window <- 10L
kt5 <- running_kurt5(x,window=window)
rm1 <- t(sapply(seq_len(length(x)),function(iii) {
xrang <- x[max(1,iii-window+1):iii]
c(moments::kurtosis(xrang)-3.0,moments::skewness(xrang),
sd(xrang),mean(xrang),length(xrang)) },
simplify=TRUE))
stopifnot(max(abs(kt5 - rm1),na.rm=TRUE) < 1e-12)
}
xc6 <- running_cent_moments(x,window=100L,max_order=6L)
Run the code above in your browser using DataLab