Last chance! 50% off unlimited learning
Sale ends in
Fast rolling functions to calculate aggregates on sliding window. Function name and arguments are experimental.
frollmean(x, n, fill=NA, algo=c("fast", "exact"), align=c("right",
"left", "center"), na.rm=FALSE, hasNA=NA, adaptive=FALSE)
frollsum(x, n, fill=NA, algo=c("fast","exact"), align=c("right", "left",
"center"), na.rm=FALSE, hasNA=NA, adaptive=FALSE)
frollapply(x, n, FUN, …, fill=NA, align=c("right", "left", "center"))
vector, list, data.frame or data.table of numeric or logical columns.
integer vector, for adaptive rolling function also list of integer vectors, rolling window size.
numeric, value to pad by. Defaults to NA
.
character, default "fast"
. When set to "exact"
,
then slower algorithm is used. It suffers less from floating point
rounding error, performs extra pass to adjust rounding error
correction and carefully handles all non-finite values. If available
it will use multiple cores. See details for more information.
character, define if rolling window covers preceding rows
("right"
), following rows ("left"
) or centered
("center"
). Defaults to "right"
.
logical. Should missing values be removed when
calculating window? Defaults to FALSE
. For details on handling
other non-finite values, see details below.
logical. If it is known that x
contains NA
then setting to TRUE
will speed up. Defaults to NA
.
logical, should adaptive rolling function be
calculated, default FALSE
. See details below.
the function to be applied in rolling fashion; see Details for restrictions
extra arguments passed to FUN
in frollapply
.
A list except when the input is a vector
and
length(n)==1
in which case a vector
is returned.
froll*
functions accepts vectors, lists, data.frames or
data.tables. They always return a list except when the input is a
vector
and length(n)==1
in which case a vector
is returned, for convenience. Thus rolling functions can be used
conveniently within data.table syntax.
Argument n
allows multiple values to apply rolling functions on
multiple window sizes. If adaptive=TRUE
, then it expects a list.
Each list element must be integer vector of window sizes corresponding
to every single observation in each column.
When algo="fast"
then on-line algorithm is used, also
any NaN, +Inf, -Inf
is treated as NA
.
Setting algo="exact"
will make rolling functions to use
compute-intensive algorithm that suffers less from floating point
rounding error. It also handles NaN, +Inf, -Inf
consistently to
base R. In case of some functions (like mean), it will additionally
make extra pass to perform floating point error correction. Error
corrections might not be truly exact on some platforms (like Windows)
when using multiple threads.
Adaptive rolling functions are special cases where for each single observation has own corresponding rolling window width. Due to the logic of adaptive rolling functions, following restrictions apply:
align
only "right"
.
if list of vectors is passed to x
, then all
list vectors must have equal length.
When multiple columns or multiple windows width are provided, then they
are run in parallel. Except for the algo="exact"
which runs in
parallel already.
frollapply
computes rolling aggregate on arbitrary R functions.
The input x
(first argument) to the function FUN
is coerced to numeric beforehand and FUN
has to return a scalar numeric value. Checks for that are made only
during the first iteration when FUN
is evaluated. Edge cases can be
found in examples below. Any R function is supported, but it is not optimized
using our own C implementation -- hence, for example, using frollapply
to compute a rolling average is inefficient. It is also always single-threaded
because there is no thread-safe API to R's C eval
. Nevertheless we've
seen the computation speed up vis-a-vis versions implemented in base R.
# NOT RUN {
d = as.data.table(list(1:6/2, 3:8/4))
# rollmean of single vector and single window
frollmean(d[, V1], 3)
# multiple columns at once
frollmean(d, 3)
# multiple windows at once
frollmean(d[, .(V1)], c(3, 4))
# multiple columns and multiple windows at once
frollmean(d, c(3, 4))
## three calls above will use multiple cores when available
# partial window using adaptive rolling function
an = function(n, len) c(seq.int(n), rep(n, len-n))
n = an(3, nrow(d))
frollmean(d, n, adaptive=TRUE)
# frollsum
frollsum(d, 3:4)
# frollapply
frollapply(d, 3:4, sum)
f = function(x, ...) if (sum(x, ...)>5) min(x, ...) else max(x, ...)
frollapply(d, 3:4, f, na.rm=TRUE)
# performance vs exactness
set.seed(108)
x = sample(c(rnorm(1e3, 1e6, 5e5), 5e9, 5e-9))
n = 15
ma = function(x, n, na.rm=FALSE) {
ans = rep(NA_real_, nx<-length(x))
for (i in n:nx) ans[i] = mean(x[(i-n+1):i], na.rm=na.rm)
ans
}
fastma = function(x, n, na.rm) {
if (!missing(na.rm)) stop("NAs are unsupported, wrongly propagated by cumsum")
cs = cumsum(x)
scs = shift(cs, n)
scs[n] = 0
as.double((cs-scs)/n)
}
system.time(ans1<-ma(x, n))
system.time(ans2<-fastma(x, n))
system.time(ans3<-frollmean(x, n))
system.time(ans4<-frollmean(x, n, algo="exact"))
system.time(ans5<-frollapply(x, n, mean))
anserr = list(
fastma = ans2-ans1,
froll_fast = ans3-ans1,
froll_exact = ans4-ans1,
frollapply = ans5-ans1
)
errs = sapply(lapply(anserr, abs), sum, na.rm=TRUE)
sapply(errs, format, scientific=FALSE) # roundoff
# frollapply corner cases
f = function(x) head(x, 2) ## FUN returns non length 1
try(frollapply(1:5, 3, f))
f = function(x) { ## FUN sometimes returns non length 1
n = length(x)
# length 1 will be returned only for first iteration where we check length
if (n==x[n]) x[1L] else range(x) # range(x)[2L] is silently ignored!
}
frollapply(1:5, 3, f)
options(datatable.verbose=TRUE)
x = c(1,2,1,1,1,2,3,2)
frollapply(x, 3, uniqueN) ## FUN returns integer
numUniqueN = function(x) as.numeric(uniqueN(x))
frollapply(x, 3, numUniqueN)
x = c(1,2,1,1,NA,2,NA,2)
frollapply(x, 3, anyNA) ## FUN returns logical
as.logical(frollapply(x, 3, anyNA))
options(datatable.verbose=FALSE)
f = function(x) { ## FUN returns character
if (sum(x)>5) "big" else "small"
}
try(frollapply(1:5, 3, f))
f = function(x) { ## FUN is not type-stable
n = length(x)
# double type will be returned only for first iteration where we check type
if (n==x[n]) 1 else NA # NA logical turns into garbage without coercion to double
}
try(frollapply(1:5, 3, f))
# }
Run the code above in your browser using DataLab