sortnut: Searching and other uses of sorting for 64bit integers

Description

This is roughly an implementation of hash functionality but based on sorting instead on a hashmap. Since sorting is more informative than hashing we can do some more interesting things.

Usage

sortnut(sorted, ...)
# S3 method for integer64
sortnut(sorted, ...)
ordernut(table, order, ...)
# S3 method for integer64
ordernut(table, order, ...)
sortfin(sorted, x, ...)
# S3 method for integer64
sortfin(sorted, x, method = NULL, ...)
orderfin(table, order, x, ...)
# S3 method for integer64
orderfin(table, order, x, method = NULL, ...)
orderpos(table, order, x, ...)
# S3 method for integer64
orderpos(table, order, x, nomatch = NA, method = NULL, ...)
sortorderpos(sorted, order, x, ...)
# S3 method for integer64
sortorderpos(sorted, order, x, nomatch = NA, method = NULL, ...)
orderdup(table, order, ...)
# S3 method for integer64
orderdup(table, order, method = NULL, ...)
sortorderdup(sorted, order, ...)
# S3 method for integer64
sortorderdup(sorted, order, method = NULL, ...)
sortuni(sorted, nunique, ...)
# S3 method for integer64
sortuni(sorted, nunique, ...)
orderuni(table, order, nunique, ...)
# S3 method for integer64
orderuni(table, order, nunique, keep.order = FALSE, ...)
sortorderuni(table, sorted, order, nunique, ...)
# S3 method for integer64
sortorderuni(table, sorted, order, nunique, ...)
orderupo(table, order, nunique, ...)
# S3 method for integer64
orderupo(table, order, nunique, keep.order = FALSE, ...)
sortorderupo(sorted, order, nunique, keep.order = FALSE, ...)
# S3 method for integer64
sortorderupo(sorted, order, nunique, keep.order = FALSE, ...)
ordertie(table, order, nties, ...)
# S3 method for integer64
ordertie(table, order, nties, ...)
sortordertie(sorted, order, nties, ...)
# S3 method for integer64
sortordertie(sorted, order, nties, ...)
sorttab(sorted, nunique, ...)
# S3 method for integer64
sorttab(sorted, nunique, ...)
ordertab(table, order, nunique, ...)
# S3 method for integer64
ordertab(table, order, nunique, denormalize = FALSE, keep.order = FALSE, ...)
sortordertab(sorted, order, ...)
# S3 method for integer64
sortordertab(sorted, order, denormalize = FALSE, ...)
orderkey(table, order, na.skip.num = 0L, ...)
# S3 method for integer64
orderkey(table, order, na.skip.num = 0L, ...)
sortorderkey(sorted, order, na.skip.num = 0L, ...)
# S3 method for integer64
sortorderkey(sorted, order, na.skip.num = 0L, ...)
orderrnk(table, order, na.count, ...)
# S3 method for integer64
orderrnk(table, order, na.count, ...)
sortorderrnk(sorted, order, na.count, ...)
# S3 method for integer64
sortorderrnk(sorted, order, na.count, ...)
sortqtl(sorted, na.count, probs, ...)
# S3 method for integer64
sortqtl(sorted, na.count, probs, ...)
orderqtl(table, order, na.count, probs, ...)
# S3 method for integer64
orderqtl(table, order, na.count, probs, ...)

Value

see details

Arguments

sorted: a sorted integer64 vector
...: further arguments, passed from generics, ignored in methods
table: the original data with original order under the sorted vector
order: an integer order vector that turns 'table' into 'sorted'
x: an integer64 vector
method: see Details
nomatch: the value to be returned if an element is not found in the hashmap
nunique: number of unique elements, usually we get this from cache or call sortnut or ordernut
keep.order: determines order of results and speed: FALSE (the default) is faster and returns in sorted order, TRUE returns in the order of first appearance in the original data, but this requires extra work
nties: number of tied values, usually we get this from cache or call sortnut or ordernut
denormalize: FALSE returns counts of unique values, TRUE returns each value with its counts
na.skip.num: 0 or the number of NAs. With 0, NAs are coded with 1L, with the number of NAs, these are coded with NA
na.count: the number of NAs, needed for this low-level function algorithm
probs: vector of probabilities in [0..1] for which we seek quantiles

Details

sortfun	orderfun	sortorderfun	see also	description
`sortnut`	`ordernut`			return number of tied and of unique values
`sortfin`	`orderfin`		`%in%.integer64`	return logical whether `x` is in `table`
	`orderpos`	`sortorderpos`	`match()`	return positions of `x` in `table`
	`orderdup`	`sortorderdup`	`duplicated()`	return logical whether values are duplicated
`sortuni`	`orderuni`	`sortorderuni`	`unique()`	return unique values (=dimensiontable)
	`orderupo`	`sortorderupo`	`unique()`	return positions of unique values
	`ordertie`	`sortordertie`		return positions of tied values
	`orderkey`	`sortorderkey`		positions of values in vector of unique values (match in dimensiontable)
`sorttab`	`ordertab`	`sortordertab`	`table()`	tabulate frequency of values
	`orderrnk`	`sortorderrnk`		rank averaging ties
`sortqtl`	`orderqtl`			return quantiles given probabilities

The functions sortfin, orderfin, orderpos and sortorderpos each offer three algorithms for finding x in table.

With method=1L each value of x is searched independently using binary search, this is fastest for small tables.

With method=2L the values of x are first sorted and then searched using doubly exponential search, this is the best allround method.

With method=3L the values of x are first sorted and then searched using simple merging, this is the fastest method if table is huge and x has similar size and distribution of values.

With method=NULL the functions use a heuristic to determine the fastest algorithm.

The functions orderdup and sortorderdup each offer two algorithms for setting the truth values in the return vector.

With method=1L the return values are set directly which causes random write access on a possibly large return vector.

With method=2L the return values are first set in a smaller bit-vector -- random access limited to a smaller memory region -- and finally written sequentially to the logical output vector.

With method=NULL the functions use a heuristic to determine the fastest algorithm.

Examples

Run this code

 message("check the code of 'optimizer64' for examples:")
 print(optimizer64)