Create, store, access, and manipulate massive matrices. Matrices are, by
default, allocated to shared memory and may use memory-mapped files.
Packages biganalytics, synchronicity, bigalgebra, and
bigtabulate provide advanced functionality. Access to and
manipulation of a big.matrix
object is exposed in an S4
class whose interface is similar to that of a matrix
. Use of
these packages in parallel environments can provide substantial speed and
memory efficiencies. bigmemory also provides a C++
framework for the development of new tools that can work both with
big.matrix
and native matrix
objects.
For obvious reasons memory that the big.matrix
uses is managed outside
the R memory pool available to the garbage collector and the memory occupied
by the big.matrix
is not visible to the R.
This has subtle implications:
Memory usage is not visible via general R functions (e.g. the gc()
function)
Garbage collector is mislead by the very small memory footprint of the big.matrix
object (which acts merely as a pointer to the external memory structure), which can result
in much less eagerness to garbage-collect the unused big.memory
objects.
After removing a last reference to a big big.matrix
, user should manually run
gc()
to reclaim the memory.
Attaching the description of already finalized big.matrix
and accessing this object
will result in undefined behavior, which simply means it will crash the current R session
with no hope of saving the data in it. To prevent R from de-allocating (finalizing) the
matrices, user should keep at least one big.memory
object somewhere in R memory in at
least one R session on the current machine.
Abruptly closed R (using e.g. task manager) will not have a chance to finalize the
big.matrix
objects, which will result in a memory leak, as the big.matrices
will remain in the memory (perhaps under obfuscated names) with no easy way to reconnect R to them.
Michael J. Kane, John W. Emerson, Peter Haverty, and Charles Determan Jr.
Maintainers: Michael J. Kane bigmemoryauthors@gmail.com
Index of functions/methods (grouped in a friendly way):
big.matrix, filebacked.big.matrix, as.big.matrixis.big.matrix, is.separated, is.filebacked
describe, attach.big.matrix, attach.resource
sub.big.matrix, is.sub.big.matrix
dim, dimnames, nrow, ncol, print, head, tail, typeof, length
read.big.matrix, write.big.matrix
mwhich
morder, mpermute
deepcopy
flush
Multi-gigabyte data sets challenge and frustrate users, even on well-equipped hardware. Use of C/C++ can provide efficiencies, but is cumbersome for interactive data analysis and lacks the flexibility and power of 's rich statistical programming environment. The package bigmemory and associated packages biganalytics, synchronicity, bigtabulate, and bigalgebra bridge this gap, implementing massive matrices and supporting their manipulation and exploration. The data structures may be allocated to shared memory, allowing separate processes on the same computer to share access to a single copy of the data set. The data structures may also be file-backed, allowing users to easily manage and analyze data sets larger than available RAM and share them across nodes of a cluster. These features of the Bigmemory Project open the door for powerful and memory-efficient parallel analyses and data mining of massive data sets.
This project (bigmemory and its sister packages) is still actively developed, although the design and current features can be viewed as "stable." Please feel free to email us with any questions: bigmemoryauthors@gmail.com.
For example, big.matrix
, mwhich
,
read.big.matrix
# Our examples are all trivial in size, rather than burning huge amounts
# of memory.
x <- big.matrix(5, 2, type="integer", init=0,
dimnames=list(NULL, c("alpha", "beta")))
x
x[1:2,]
x[,1] <- 1:5
x[,"alpha"]
colnames(x)
options(bigmemory.allow.dimnames=TRUE)
colnames(x) <- NULL
x[,]
Run the code above in your browser using DataLab