priorityFn
,
with the compound_id numbers of the group. This function should
then assign priorities to each compound-descriptor pair, however
it wishes. Priorities are integer values with lower values being
used in preference of higher values.It is important that this function be called after all data is loaded. It may be that a compound loaded at the beginning of a data set shares a descriptor with a compound loaded near the end of the data set. If the priorities were set at some point in between these then it would not see all the compounds for that one descriptor.
If a SNOW cluster and connection source function are given, it will run in parallel.
Some pre-defined functions that can be use for priorityFn
are:
randomPriorities
: Set the priorities of compounds within a descriptor group
randomly.
forestSizePriorities
: Set the priority based on the number
of disconnected components (trees) within the compound. Compounds
with fewer trees will have a higher priority (lower numerical
value) than compounds with more trees.
setPriorities(conn,priorityFn,descriptorIds=c(),cl=NULL,connSource=NULL)
forestSizePriorities(conn,compIds)
randomPriorities(conn,compIds)
The function should return a data.frame with the fields "compound_id" and "priority". The order of the rows is not important.
setPriorities
, no value is returned.
randomPriorities
and forestSizePriorities
return
a data.frame with columns "compound_id" and "priority".
## Not run:
# data(sdfsample)
# conn = initDb("sample.db")
# sdfLoad(conn,sdfsample)
# setPriorities(conn,forestSizePriorities)
# ## End(Not run)
Run the code above in your browser using DataLab