neuronlistfh: neuronlistfh - List of neurons loaded on demand from disk or remote website

Description

neuronlistfh objects consist of a list of neuron objects along with an optional attached dataframe containing information about the neurons. In contrast to neuronlist objects the neurons are not present in memory but are instead dynamically loaded from disk as required. neuronlistfh objects also inherit from neuronlist and therefore any appropriate methods e.g. plot3d.neuronlist can also be used on neuronlistfh objects.

neuronlistfh constructs a neuronlistfh object from a filehash, data.frame and keyfilemap. End users will not typically use this function to make a neuronlistfh. They will usually read them using read.neuronlistfh and sometimes create them by using as.neuronlistfh on a neuronlist object.

is.neuronlistfh test if an object is a neuronlistfh

as.neuronlistfh generic function to convert an object to neuronlistfh

as.neuronlistfh.neuronlist converts a regular neuronlist to one backed by a filehash object with an on disk representation

Usage

neuronlistfh(db, df, keyfilemap, hashmap = 1000L)
is.neuronlistfh(nl)
as.neuronlistfh(x, df, ...)
# S3 method for neuronlist
as.neuronlistfh(
  x,
  df = attr(x, "df"),
  dbdir = NULL,
  dbClass = c("RDS", "RDS2"),
  remote = NULL,
  WriteObjects = c("yes", "no", "missing"),
  ...
)

Value

a neuronlistfh object which is a character vector with classes neuronlistfh, neuronlist and attributes db, df. See Implementation details.

Arguments

db: a filehash object that manages an on disk database of neuron objects. See Implementation details.
df: Optional dataframe, where each row describes one neuron
keyfilemap: A named character vector in which the elements are filenames on disk (managed by the filehash object) and the names are the keys used in R to refer to the neuron objects. Note that the keyfilemap defines the order of objects in the neuronlist and will be used to reorder the dataframe if necessary.
hashmap: A logical indicating whether to add a hashed environment for rapid object lookup by name or an integer or an integer definining a threhsold number of objects when this will happen (see Implementation details).
nl: Object to test
x: Object to convert
...: Additional arguments for methods, eventually passed to neuronlistfh() constructor.
dbdir: The path to the underlying filehash database on disk. By convention this should be a path whose final element is 'data'
dbClass: The filehash database class. Defaults to RDS.
remote: The url pointing to a remote repository containing files for each neuron.
WriteObjects: Whether to write objects to disk. Missing implies that existing objects will not be overwritten. Default "yes".

Implementation details

neuronlistfh objects are a hybrid between regular neuronlist objects that organise data and metadata for collections of neurons and a backing filehash object. Instead of keeping objects in memory, they are always loaded from disk. Although this sounds like it might be slow, for nearly all practical purposes (e.g. plotting neurons) the time to read the neuron from disk is small compared with the time to plot the neuron; the OS will cache repeated reads of the same file. The benefits in memory and startup time (<1s vs 100s for our 16,000 neuron database) are vital for collections of 1000s of neurons e.g. for dynamic report generation using knitr or for users with <8Gb RAM or running 32 bit R.

neuronlistfh objects include:

attr("keyfilemap"): A named character vector that determines the ordering of objects in the neuronlist and translates keys in R to filenames on disk. For objects created by as.neuronlistfh the filenames will be the md5 hash of the object as calculated using digest. This design means that the same key can be used to refer to multiple distinct objects on disk. Objects are effecitvely versioned by their contents. So if an updated neuronlistfh object is posted to a website and then fetched by a user it will result in the automated download of any updated objects to which it refers.
attr("db"): The backing database - typically of class filehashRDS. This manages the loading of objects from disk.
attr(x,"df"): The data.frame of metadata which can be used to select and plot neurons. See neuronlist for examples.
attr(x,"hashmap"): (Optional) a hashed environment which can be used for rapid lookup using key names (rather than numeric/logical indices). There is a space potential to pay for this redundant lookup method, but it is normally worth while given that the dataframe object is typically considerably larger. To give some numbers, the additional environment might occupy ~ 1 time from 0.5 ms to 1us. Having located the object, on my machine it can take as little as 0.1ms to load from disk, so these savings are relevant.

Presently only backing objects which extend the filehash class are supported (although in theory other backing objects could be added). These include:

filehash RDS
filehash RDS2 (experimental)

We have also implemented a simple remote access protocol (currently only for the RDS format). This allows a neuronlistfh object to be read from a url and downloaded to a local path. Subsequent attempts to access neurons stored in this list will result in automated download of the requested neuron to the local cache.

An alternative backend, the experimental RDS2 format is supported (available at https://github.com/jefferis/filehash). This is likely to be the most effective for large (5,000-500,000) collections of neurons, especially when using network filesystems (nfs, afp) which are typically very slow at listing large directories.

Note that objects are stored in a filehash, which by definition does not have any ordering of its elements. However neuronlist objects (like lists) do have an ordering. Therefore the names of a neuronlistfh object are not necessarily the same as the result of calling names() on the underlying filehash object.

Examples

Run this code

if (FALSE) {
kcnl=read.neuronlistfh('http://jefferislab.org/si/nblast/flycircuit/kcs20.rds',
'path/to/my/project/folder')
# this will automatically download the neurons from the web the first time
# it is run
plot3d(kcnl)
}
if (FALSE) {
# create neuronlistfh object backed by filehash with one file per neuron
# by convention we create a subfolder called data in which the objects live
kcs20fh=as.neuronlistfh(kcs20, dbdir='/path/to/my/kcdb/data')
plot3d(subset(kcs20fh,type=='gamma'))
# ... and, again by convention, save the neuronlisfh object next to filehash 
# backing database
write.neuronlistfh(kcs20fh, file='/path/to/my/kcdb/kcdb.rds')

# in a new session
read.neuronlistfh("/path/to/my/kcdb/kcdb.rds")
plot3d(subset(kcs20fh, type=='gamma'))
}