LongVectors: Long Vectors
Description
Vectors of \(2^{31}\) or more elements were added in R 3.0.0.Matrix algebra
It is now possible to use \(m \times n\) matrices with more
than 2 billion elements. Whether matrix algebra (including
%*%
, crossprod
, svd
,
qr
, solve
and eigen
will
actually work is somewhat implementation dependent, including the
Fortran compiler used and if an external BLAS or LAPACK is used. An efficient parallel BLAS implementation will often be important to
obtain usable performance. For example on one particular platform
chol
on a 47,000 square matrix took about 5 hours with the
internal BLAS, 21 minutes using an optimized BLAS on one core, and 2
minutes using an optimized BLAS on 16 cores.Details
Prior to R 3.0.0, all vectors in R were restricted to at most
\(2^{31} - 1\) elements and could be indexed by integer
vectors. Currently all atomic (raw, logical, integer, numeric, complex,
character) vectors, lists and expressions can be much
longer on 64-bit platforms: such vectors are referred to as
‘long vectors’ and have a slightly different internal
structure. In theory up they can to \(2^{52}\) elements, but
address space limits of current CPUs and OSes will be much smaller.
Such objects will have a length that is expressed as a double,
and can be indexed by double vectors. Arrays (including matrices) can be based on long vectors provided each
of their dimensions is at most \(2^{31} - 1\): thus there
are no 1-dimensional long arrays. R code typically only needs minor changes to work with long vectors,
maybe only checking that as.integer
is not used unnecessarily
for e.g. lengths. However, compiled code typically needs quite
extensive changes. Note that the .C
and
.Fortran
interfaces do not accept long vectors, so
.Call
(or similar) has to be used. Because of the storage requirements (a minimum of 64 bytes per
character string), character vectors are only going to be usable if
they have a small number of distinct elements, and even then factors
will be more efficient (4 bytes per element rather than 8). So it is
expected that most of the usage of long vectors will be integer
vectors (including factors) and numeric vectors.