Learn R Programming

bit64 (version 4.5.2)

benchmark64: Function for measuring algorithmic performance
of high-level and low-level integer64 functions

Description

benchmark64 compares high-level integer64 functions against the integer functions from Base R
optimizer64 compares for each high-level integer64 function the Base R integer function with several low-level integer64 functions with and without caching

Usage

benchmark64(nsmall = 2^16, nbig = 2^25, timefun = repeat.time
)
optimizer64(nsmall = 2^16, nbig = 2^25, timefun = repeat.time
, what = c("match", "%in%", "duplicated", "unique", "unipos", "table", "rank", "quantile")
, uniorder = c("original", "values", "any")
, taborder = c("values", "counts")
, plot = TRUE
)

Value

benchmark64 returns a matrix with elapsed seconds, different high-level tasks in rows and different scenarios to solve the task in columns. The last row named 'SESSION' contains the elapsed seconds of the exemplary sesssion.


optimizer64 returns a dimensioned list with one row for each high-level function timed and two columns named after the values of the nsmall and nbig sample sizes. Each list cell contains a matrix with timings, low-level-methods in rows and three measurements c("prep","both","use") in columns. If it can be measured separately, prep contains the timing of preparatory work such as sorting and hashing, and use contains the timing of using the prepared work. If the function timed does both, preparation and use, the timing is in both.

Arguments

nsmall

size of smaller vector

nbig

size of larger bigger vector

timefun

a function for timing such as repeat.time or system.time

what

a vector of names of high-level functions

uniorder

one of the order parameters that are allowed in unique.integer64 and unipos.integer64

taborder

one of the order parameters that are allowed in table.integer64

plot

set to FALSE to suppress plotting

Author

Jens Oehlschlägel <Jens.Oehlschlaegel@truecluster.com>

Details

benchmark64 compares the following scenarios for the following use cases:

scenario nameexplanation
32-bitapplying Base R function to 32-bit integer data
64-bitapplying bit64 function to 64-bit integer data (with no cache)
hashcachedito when cache contains hashmap, see hashcache
sortordercachedito when cache contains sorting and ordering, see sortordercache
ordercachedito when cache contains ordering only, see ordercache
allcachedito when cache contains sorting, ordering and hashing

use case nameexplanation
cachefilling the cache according to scenario
match(s,b)match small in big vector
s %in% bsmall %in% big vector
match(b,s)match big in small vector
b %in% sbig %in% small vector
match(b,b)match big in (different) big vector
b %in% bbig %in% (different) big vector
duplicated(b)duplicated of big vector
unique(b)unique of big vector
table(b)table of big vector
sort(b)sorting of big vector
order(b)ordering of big vector
rank(b)ranking of big vector
quantile(b)quantiles of big vector
summary(b)summary of of big vector
SESSIONexemplary session involving multiple calls (including cache filling costs)

Note that the timings for the cached variants do not contain the time costs of building the cache, except for the timing of the exemplary user session, where the cache costs are included in order to evaluate amortization.

See Also

integer64

Examples

Run this code
message("this small example using system.time does not give serious timings\n
this we do this only to run regression tests")
benchmark64(nsmall=2^7, nbig=2^13, timefun=function(expr)system.time(expr, gcFirst=FALSE))
optimizer64(nsmall=2^7, nbig=2^13, timefun=function(expr)system.time(expr, gcFirst=FALSE)
, plot=FALSE
)
if (FALSE) {
message("for real measurement of sufficiently large datasets run this on your machine")
benchmark64()
optimizer64()
}
message("let's look at the performance results on Core i7 Lenovo T410 with 8 GB RAM")
data(benchmark64.data)
print(benchmark64.data)

matplot(log2(benchmark64.data[-1,1]/benchmark64.data[-1,])
, pch=c("3", "6", "h", "s", "o", "a") 
, xlab="tasks [last=session]"
, ylab="log2(relative speed) [bigger is better]"
)
matplot(t(log2(benchmark64.data[-1,1]/benchmark64.data[-1,]))
, type="b", axes=FALSE 
, lwd=c(rep(1, 14), 3)
, xlab="context"
, ylab="log2(relative speed) [bigger is better]"
)
axis(1
, labels=c("32-bit", "64-bit", "hash", "sortorder", "order", "hash+sortorder")
, at=1:6
)
axis(2)
data(optimizer64.data)
print(optimizer64.data)
oldpar <- par(no.readonly = TRUE)
par(mfrow=c(2,1))
par(cex=0.7)
for (i in 1:nrow(optimizer64.data)){
 for (j in 1:2){
   tim <- optimizer64.data[[i,j]]
  barplot(t(tim))
  if (rownames(optimizer64.data)[i]=="match")
   title(paste("match", colnames(optimizer64.data)[j], "in", colnames(optimizer64.data)[3-j]))
  else if (rownames(optimizer64.data)[i]=="%in%")
   title(paste(colnames(optimizer64.data)[j], "%in%", colnames(optimizer64.data)[3-j]))
  else
   title(paste(rownames(optimizer64.data)[i], colnames(optimizer64.data)[j]))
 }
}
par(mfrow=c(1,1))

Run the code above in your browser using DataLab