Learn R Programming

ff (version 4.5.0)

ramorder.default: Sorting: order R vector in-RAM and in-place

Description

Function ramorder will order the input vector in-place (without making a copy) and return the number of NAs found

Usage

# S3 method for default
ramorder(x, i, has.na = TRUE, na.last = TRUE, decreasing = FALSE
, stable = TRUE, optimize = c("time", "memory"), VERBOSE = FALSE, ...)
# S3 method for default
mergeorder(x, i, has.na = TRUE, na.last = TRUE, decreasing = FALSE, ...)
# S3 method for default
radixorder(x, i, has.na = TRUE, na.last = TRUE, decreasing = FALSE, ...)
# S3 method for default
keyorder(x, i, keyrange=range(x, na.rm=has.na), has.na = TRUE, na.last = TRUE
, decreasing = FALSE, ...)
# S3 method for default
shellorder(x, i, has.na = TRUE, na.last = TRUE, decreasing = FALSE, stabilize=FALSE, ...)

Value

integer scalar with the number of NAs. This is always 0 with has.na=FALSE

Arguments

x

an atomic R vector

i

a integer vector with a permuation of positions in x (you risk a crash if you violate this)

keyrange

an integer vector with two values giving the smallest and largest possible value in x, note that you should give this explicitely for best performance, relying on the default needs one pass over the data to determine the range

has.na

boolean scalar telling ramorder whether the vector might contain NAs. Note that you risk a crash if there are unexpected NAs with has.na=FALSE

na.last

boolean scalar telling ramorder whether to order NAs last or first. Note that 'boolean' means that there is no third option NA as in order

decreasing

boolean scalar telling ramorder whether to order increasing or decreasing

stable

set to false if stable ordering is not needed (may enlarge the set of ordering methods considered)

optimize

by default ramorder optimizes for 'time' which requires more RAM, set to 'memory' to minimize RAM requirements and sacrifice speed

VERBOSE

cat some info about chosen method

stabilize

Set to TRUE for stabilizing the result of shellorder (for equal keys the order values will be sorted, this only works if i=1:n) to minimize RAM requirements and sacrifice speed

...

ignored

Author

Jens Oehlschlägel

Details

Function ramorder is a front-end to a couple of single-threaded ordering algorithms that have been carefully implemented to be fast with and without NAs.
The default is a mergeorder algorithm without copying (Sedgewick 8.4) for integer and double data which requires 2x the RAM of its input vector (character or complex data are not supported). Mergeorder is fast, stable with a reliable runtime.
For integer data longer than a certain length we improve on mergeorder by using a faster LSD radixorder algorithm (Sedgewick 10.5) that uses 2x the RAM of its input vector plus 65536+1 integers.
For booleans, logicals, integers at or below the resolution of smallint and for factors below a certain number of levels we use a key-index order instead of mergeorder or radix order (note that R has a (slower) key-index order in sort.list available with confusingly named method='radix' but the standard order does not leverage it for factors (2-11.1). If you call keyorder directly, you should provide a known 'keyrange' directly to obtain the full speed.
Finally the user can request a order method that minimizes memory use at the price of longer computation time with optimize='memory' -- currently a shellorder.

References

Robert Sedgewick (1997). Algorithms in C, Third edition. Addison-Wesley.

See Also

order, fforder, dforder, ramsort

Examples

Run this code
   n <- 50
   x <- sample(c(NA, NA, 1:26), n, TRUE)
   order(x)
   i <- 1:n
   ramorder(x, i)
   i
   x[i]

   if (FALSE) {
      message("Note how the datatype influences sorting speed")
      n <- 1e7
      x <- sample(1:26, n, TRUE)

      y <- as.double(x)
      i <- 1:n
      system.time(ramorder(y, i))

      y <- as.integer(x)
      i <- 1:n
      system.time(ramorder(y, i))

      y <- as.short(x)
      i <- 1:n
      system.time(ramorder(y, i))

      y <- factor(letters)[x]
      i <- 1:n
      system.time(ramorder(y, i))
   }

Run the code above in your browser using DataLab