neuronlistfh
objects consist of a list of neuron objects
along with an optional attached dataframe containing information about the
neurons. In contrast to neuronlist
objects the neurons are not
present in memory but are instead dynamically loaded from disk as required.
neuronlistfh
objects also inherit from neuronlist
and
therefore any appropriate methods e.g. plot3d.neuronlist
can also be
used on neuronlistfh
objects.
neuronlistfh
constructs a neuronlistfh object from a
filehash
, data.frame
and keyfilemap
. End users will
not typically use this function to make a neuronlistfh
.
They will usually read them using read.neuronlistfh
and sometimes
create them by using as.neuronlistfh
on a neuronlist
object.
is.neuronlistfh
test if an object is a neuronlistfh
as.neuronlistfh
generic function to convert an object to
neuronlistfh
as.neuronlistfh.neuronlist
converts a regular neuronlist
to one backed by a filehash object with an on disk representation
neuronlistfh(db, df, keyfilemap, hashmap = 1000L)is.neuronlistfh(nl)
as.neuronlistfh(x, df, ...)
# S3 method for neuronlist
as.neuronlistfh(
x,
df = attr(x, "df"),
dbdir = NULL,
dbClass = c("RDS", "RDS2"),
remote = NULL,
WriteObjects = c("yes", "no", "missing"),
...
)
a neuronlistfh
object which is a character vector
with
classes neuronlistfh, neuronlist
and attributes db, df
. See
Implementation details.
a filehash
object that manages an on disk database of neuron
objects. See Implementation details.
Optional dataframe, where each row describes one neuron
A named character vector in which the elements are filenames on disk (managed by the filehash object) and the names are the keys used in R to refer to the neuron objects. Note that the keyfilemap defines the order of objects in the neuronlist and will be used to reorder the dataframe if necessary.
A logical indicating whether to add a hashed environment for rapid object lookup by name or an integer or an integer definining a threhsold number of objects when this will happen (see Implementation details).
Object to test
Object to convert
Additional arguments for methods, eventually passed to
neuronlistfh()
constructor.
The path to the underlying filehash
database on disk. By
convention this should be a path whose final element is 'data'
The filehash
database class. Defaults to RDS
.
The url pointing to a remote repository containing files for each neuron.
Whether to write objects to disk. Missing implies that
existing objects will not be overwritten. Default "yes"
.
neuronlistfh objects are a hybrid between
regular neuronlist
objects that organise data and metadata for
collections of neurons and a backing filehash
object. Instead of
keeping objects in memory, they are always loaded from disk.
Although this sounds like it might be slow, for nearly all practical
purposes (e.g. plotting neurons) the time to read the neuron from disk is
small compared with the time to plot the neuron; the OS will cache repeated
reads of the same file. The benefits in memory and startup time (<1s vs
100s for our 16,000 neuron database) are vital for collections of 1000s of
neurons e.g. for dynamic report generation using knitr or for users with
<8Gb RAM or running 32 bit R.
neuronlistfh objects include:
A named character vector that determines the
ordering of objects in the neuronlist and translates keys in R to filenames
on disk. For objects created by as.neuronlistfh
the filenames will
be the md5 hash of the object as calculated using digest
. This
design means that the same key can be used to refer to multiple distinct
objects on disk. Objects are effecitvely versioned by their contents. So if
an updated neuronlistfh object is posted to a website and then fetched by a
user it will result in the automated download of any updated objects to
which it refers.
The backing database - typically of class
filehashRDS
. This manages the loading of objects from disk.
The data.frame of metadata which can be used to select
and plot neurons. See neuronlist
for examples.
(Optional) a hashed environment which can be used for rapid lookup using key names (rather than numeric/logical indices). There is a space potential to pay for this redundant lookup method, but it is normally worth while given that the dataframe object is typically considerably larger. To give some numbers, the additional environment might occupy ~ 1 time from 0.5 ms to 1us. Having located the object, on my machine it can take as little as 0.1ms to load from disk, so these savings are relevant.
Presently only backing objects which extend the filehash
class are
supported (although in theory other backing objects could be added). These
include:
filehash RDS
filehash RDS2 (experimental)
We have also implemented a simple remote access protocol (currently only
for the RDS
format). This allows a neuronlistfh object to be read
from a url and downloaded to a local path. Subsequent attempts to access
neurons stored in this list will result in automated download of the
requested neuron to the local cache.
An alternative backend, the experimental RDS2
format is supported
(available at https://github.com/jefferis/filehash). This is likely
to be the most effective for large (5,000-500,000) collections of neurons,
especially when using network filesystems (nfs, afp) which are typically
very slow at listing large directories.
Note that objects are stored in a filehash, which by definition does not
have any ordering of its elements. However neuronlist objects (like lists)
do have an ordering. Therefore the names of a neuronlistfh object are not
necessarily the same as the result of calling names()
on the
underlying filehash object.
filehash-class
Other neuronlistfh:
[.neuronlistfh()
,
read.neuronlistfh()
,
remotesync()
,
write.neuronlistfh()
Other neuronlist:
*.neuronlist()
,
is.neuronlist()
,
neuronlist-dataframe-methods
,
neuronlist()
,
nlapply()
,
read.neurons()
,
write.neurons()
if (FALSE) {
kcnl=read.neuronlistfh('http://jefferislab.org/si/nblast/flycircuit/kcs20.rds',
'path/to/my/project/folder')
# this will automatically download the neurons from the web the first time
# it is run
plot3d(kcnl)
}
if (FALSE) {
# create neuronlistfh object backed by filehash with one file per neuron
# by convention we create a subfolder called data in which the objects live
kcs20fh=as.neuronlistfh(kcs20, dbdir='/path/to/my/kcdb/data')
plot3d(subset(kcs20fh,type=='gamma'))
# ... and, again by convention, save the neuronlisfh object next to filehash
# backing database
write.neuronlistfh(kcs20fh, file='/path/to/my/kcdb/kcdb.rds')
# in a new session
read.neuronlistfh("/path/to/my/kcdb/kcdb.rds")
plot3d(subset(kcs20fh, type=='gamma'))
}
Run the code above in your browser using DataLab