Function 'ffdf' creates ff data.frames stored on disk very similar to 'data.frame'
ffdf(...
, row.names = NULL
, ff_split = NULL
, ff_join = NULL
, ff_args = NULL
, update = TRUE
, BATCHSIZE = .Machine$integer.max
, BATCHBYTES = getOption("ffbatchbytes")
, VERBOSE = FALSE)
ff
vectors or matrices (optionally wrapped in I()
that shall be bound together to an ffdf object
A character
vector. Not recommended for large objects with many rows.
A vector of character names or integer positions identifying input components to physically split into single ff_vectors. If vector elements have names, these are used as root name for the new ff files.
A list of vectors with character names or integer positions identifying input components to physically join in the same ff matrix. If list elements have names, these are used to name the new ff files.
By default (TRUE) new ff files are updated with content of input ff objects. Setting to FALSE prevents this update.
a list with further arguments passed to ff
in case that new ff objects are created via 'ff_split' or 'ff_join'
passed to update.ff
passed to update.ff
passed to update.ff
A list with components
the underlying ff vectors and matrices, to be accessed via physical
the virtual features of the ffdf including the virtual-to-physical mapping, to be accessed via virtual
the optional row.names, see argument row.names
The following methods and functions are available for ffdf objects:
Type | Name | Assign | Comment |
Basic functions | |||
function | ffdf |
constructor for ffdf objects | |
generic | update |
updates one ffdf object with the content of another | |
generic | clone |
clones an ffdf object | |
method | print |
print ffdf | |
method | str |
ffdf object structure | |
Class test and coercion | |||
function | is.ffdf |
check if inherits from ff | |
generic | as.ffdf |
coerce to ff, if not yet | |
generic | as.data.frame |
coerce to ram data.frame | |
Virtual storage mode | |||
generic | vmode |
|
get virtual modes for all (virtual) columns |
Physical attributes | |||
function | physical |
|
get physical attributes |
Virtual attributes | |||
function | virtual |
|
get virtual attributes |
method | length |
|
get length |
method | dim |
<- |
get dim and set nrow |
generic | dimorder |
|
get the dimorder (non-standard if any component is non-standard) |
method | names |
<- |
set and get names |
method | row.names |
<- |
set and get row.names |
method | dimnames |
<- |
set and get dimnames |
method | pattern |
<- |
set pattern (rename/move files) |
Access functions | |||
method | [ |
<- | set and get data.frame content ([,] ) or get ffdf with less columns ([] ) |
method | [[ |
<- | set and get single column ff object |
method | $ |
<- | set and get single column ff object |
Opening/Closing/Deleting | |||
generic | is.open |
tri-bool is.open status of the physical ff components | |
method | open |
open all physical ff objects (is done automatically on access) | |
method | close |
close all physical ff objects | |
method | delete |
deletes all physical ff files | |
method | finalize |
call finalizer | |
processing | |||
method | chunk |
create chunked index | |
method | sortLevels |
sort and recode levels | |
Other |
By default, creating an 'ffdf' object will NOT create new ff files, instead existing files are referenced.
This differs from data.frame
, which always creates copies of the input objects,
most notably in data.frame(matrix())
, where an input matrix is converted to single columns.
ffdf by contrast, will store an input matrix physically as the same matrix and virtually map it to columns.
Physically copying a large ff matrix to single ff vectors can be expensive.
More generally, ffdf objects have a physical
and a virtual
component,
which allows very flexible dataframe designs: a physically stored matrix can be virtually mapped to single columns,
a couple of physically stored vectors can be virtually mapped to a single matrix.
The means to configure these are I
for the virtual representation and the 'ff_split' and 'ff_join'
arguments for the physical representation. An ff matrix wrapped into 'I()' will return the input matrix as a single object,
using 'ff_split' will store this matrix as single vectors - and thus create new ff files.
'ff_join' will copy a couple of input vectors into a unified new ff matrix with dimorder=c(2,1)
,
but virtually they will remain single columns. The returned ffdf object has also a dimorder
attribute,
which indicates whether the ffdf object contains a matrix with non-standard dimorder c(2,1)
, see dimorderStandard
.
Currently, virtual windows
are not supported for ffdf.
data.frame
, ff
, for more example see physical
# NOT RUN {
m <- matrix(1:12, 3, 4, dimnames=list(c("r1","r2","r3"), c("m1","m2","m3","m4")))
v <- 1:3
ffm <- as.ff(m)
ffv <- as.ff(v)
d <- data.frame(m, v)
ffd <- ffdf(ffm, v=ffv, row.names=row.names(ffm))
all.equal(d, ffd[,])
ffd
physical(ffd)
d <- data.frame(m, v)
ffd <- ffdf(ffm, v=ffv, row.names=row.names(ffm), ff_split=1)
all.equal(d, ffd[,])
ffd
physical(ffd)
d <- data.frame(m, v)
ffd <- ffdf(ffm, v=ffv, row.names=row.names(ffm), ff_join=list(newff=c(1,2)))
all.equal(d, ffd[,])
ffd
physical(ffd)
d <- data.frame(I(m), I(v))
ffd <- ffdf(m=I(ffm), v=I(ffv), row.names=row.names(ffm))
all.equal(d, ffd[,])
ffd
physical(ffd)
rm(ffm,ffv,ffd); gc()
# }
Run the code above in your browser using DataLab