These functions are specialized variants of the most common ways that
slide()
is generally used. Notably, slide_sum()
can be used for
rolling sums, and slide_mean()
can be used for rolling averages.
These specialized variants are much faster and more memory efficient
than using an otherwise equivalent call constructed with slide_dbl()
or slide_lgl()
, especially with a very wide window.
slide_sum(
x,
...,
before = 0L,
after = 0L,
step = 1L,
complete = FALSE,
na_rm = FALSE
)slide_prod(
x,
...,
before = 0L,
after = 0L,
step = 1L,
complete = FALSE,
na_rm = FALSE
)
slide_mean(
x,
...,
before = 0L,
after = 0L,
step = 1L,
complete = FALSE,
na_rm = FALSE
)
slide_min(
x,
...,
before = 0L,
after = 0L,
step = 1L,
complete = FALSE,
na_rm = FALSE
)
slide_max(
x,
...,
before = 0L,
after = 0L,
step = 1L,
complete = FALSE,
na_rm = FALSE
)
slide_all(
x,
...,
before = 0L,
after = 0L,
step = 1L,
complete = FALSE,
na_rm = FALSE
)
slide_any(
x,
...,
before = 0L,
after = 0L,
step = 1L,
complete = FALSE,
na_rm = FALSE
)
A vector the same size as x
containing the result of applying the
summary function over the sliding windows.
For sliding sum, mean, prod, min, and max, a double vector will be returned.
For sliding any and all, a logical vector will be returned.
[vector]
A vector to compute the sliding function on.
For sliding sum, mean, prod, min, and max, x
will be cast to a double
vector with vctrs::vec_cast()
.
For sliding any and all, x
will be cast to a logical vector with
vctrs::vec_cast()
.
These dots are for future extensions and must be empty.
[integer(1) / Inf]
The number of values before or after the current element to
include in the sliding window. Set to Inf
to select all elements
before or after the current element. Negative values are allowed, which
allows you to "look forward" from the current element if used as the
.before
value, or "look backwards" if used as .after
.
[positive integer(1)]
The number of elements to shift the window forward between function calls.
[logical(1)]
Should the function be evaluated on complete windows only? If FALSE
,
the default, then partial computations will be allowed.
[logical(1)]
Should missing values be removed from the computation?
These variants are implemented using a data structure known as a segment tree, which allows for extremely fast repeated range queries without loss of precision.
One alternative to segment trees is to directly recompute the summary
function on each full window. This is what is done by using, for example,
slide_dbl(x, sum)
. This is extremely slow with large window sizes and
wastes a lot of effort recomputing nearly the same information on each
window. It can be made slightly faster by moving the sum to C to avoid
intermediate allocations, but it still fairly slow.
A second alternative is to use an online algorithm, which uses information from the previous window to compute the next window. These are extremely fast, only requiring a single pass through the data, but often suffer from numerical instability issues.
Segment trees are an attempt to reconcile the performance issues of the direct approach with the numerical issues of the online approach. The performance of segment trees isn't quite as fast as online algorithms, but is close enough that it should be usable on most large data sets without any issues. Unlike online algorithms, segment trees don't suffer from any extra numerical instability issues.
Note that these functions are not generic and do not respect method
dispatch of the corresponding summary function (i.e. base::sum()
,
base::mean()
). Input will always be cast to a double or logical vector
using vctrs::vec_cast()
, and an internal method for computing the summary
function will be used.
Due to the structure of segment trees, slide_mean()
does not perform the
same "two pass" mean that mean()
does (the intention of the second pass is
to perform a floating point error correction). Because of this, there may be
small differences between slide_mean(x)
and slide_dbl(x, mean)
in some
cases.
Leis, Kundhikanjana, Kemper, and Neumann (2015). "Efficient Processing of Window Functions in Analytical SQL Queries". https://dl.acm.org/doi/10.14778/2794367.2794375
slide_index_sum()
x <- c(1, 5, 3, 2, 6, 10)
# `slide_sum()` can be used for rolling sums.
# The following are equivalent, but `slide_sum()` is much faster.
slide_sum(x, before = 2)
slide_dbl(x, sum, .before = 2)
# `slide_mean()` can be used for rolling averages
slide_mean(x, before = 2)
# Only evaluate the sum on complete windows
slide_sum(x, before = 2, after = 1, complete = TRUE)
# Skip every other calculation
slide_sum(x, before = 2, step = 2)
Run the code above in your browser using DataLab