Learn R Programming

ff (version 4.5.2)

ffdfindexget: Reading and writing ffdf data.frame using ff subscripts

Description

Function ffdfindexget allows to extract rows from an ffdf data.frame according to positive integer suscripts stored in an ff vector.
Function ffdfindexset allows the inverse operation: assigning to rows of an ffdf data.frame according to positive integer suscripts stored in an ff vector. These functions allow more control than the method dispatch of [ and [<- if an ff integer subscript is used.

Usage

ffdfindexget(x, index, indexorder = NULL, autoindexorder = 3, FF_RETURN = NULL
  , BATCHSIZE = NULL, BATCHBYTES = getOption("ffmaxbytes"), VERBOSE = FALSE)
  ffdfindexset(x, index, value, indexorder = NULL, autoindexorder = 3
  , BATCHSIZE = NULL, BATCHBYTES = getOption("ffmaxbytes"), VERBOSE = FALSE)

Value

Function ffdfindexget returns a ffdf data.frame with those rows selected by the ff index vector.


Function ffdfindexset returns x with those rows replaced that had been requested by index and value.

Arguments

x

A ffdf data.frame containing the elements

index

A ff integer vector with integer subscripts in the range from 1 to length(x).

value

A ffdf data.frame like x with the rows to be assigned

indexorder

Optionally the return value of ffindexorder, see details

autoindexorder

The minimum number of columns (which need chunked indexordering) for which we switch from on-the-fly ordering to stored ffindexorder

FF_RETURN

Optionally an ffdf data.frame of the same type as x in which the returned values shall be stored, see details.

BATCHSIZE

Optinal limit for the batchsize (see details)

BATCHBYTES

Limit for the number of bytes per batch

VERBOSE

Logical scalar for verbosing

Author

Jens Oehlschlägel

Details

Accessing rows of an ffdf data.frame identified by integer positions in an ff vector is a non-trivial task, because it could easily lead to random-access to disk files. We avoid random access by loading batches of the subscript values into RAM, order them ascending, and only then access the ff values on disk. Such ordering is don on-thy-fly for upto autoindexorder-1 columns that need ordering. For autoindexorder o more columns we do the batched ordering upfront with ffindexorder and then re-use it in each call to ffindexget resp. ffindexset.

See Also

Extract.ff, ffindexget, ffindexorder

Examples

Run this code
message("ff integer subscripts with ffdf return/assign values")
x <- ff(factor(letters))
y <- ff(1:26)
d <- ffdf(x,y)
i <- ff(2:9)
di <- d[i,]
di
d[i,] <- di
message("ff integer subscripts: more control with ffindexget/ffindexset")
di <- ffdfindexget(d, i, FF_RETURN=di)
d <- ffdfindexset(d, i, di)
rm(x, y, d, i, di)
gc()

Run the code above in your browser using DataLab