pr_DB
represents the registry of all proximity measures
available. For each
measure, it comprises meta-information that can be queried and
extended. Also, new measures can be added. This is done using
the following accessor functions of the pr_DB
object:
get_field_names()
returns a character
vector with all field names. get_field()
returns the information
for a specific field as a list with components named as described
above. get_fields()
returns a list with all field
entries. set_field()
is used to create new fields in the
repository (the default value will be set in all
entries). get_entry_names()
returns a character vector with (the first
alias of) all entries. entry_exists()
is a predicate checking
if an entry with the specified alias exists in the
registry. get_entry()
returns the specified entry if it exists (and, by
default, gives an error if it does not). get_entries()
is used to
query more than one entry: either those matching name
exactly, or
those where the regular expression in pattern
matches any
character field in an entry. By default, all values are
returned. delete_entry
removes an existing entry from the
registry (note that only user-provided entries can be deleted).
set_entry
and modify_entry
require a named list
of arguments used as field entries.
At least the names
index field is required. set_entry
will check for all other mandatory fields. If specified in the field
meta data, each field entry and the entry as a whole is checked for
validity. Note that only user-specified fields and/or entries can be
modified, the data shipped with the package are read-only.
The registry fields currently available are as follows:
A function specified as FUN
parameter has mandatory arguments
x
and y
(if abcd
is FALSE
), and a
,
b
, c
, d
, n
otherwise. Additionally, it gets
all optional parameters specified by the user in the ...
argument of the dist
and simil
functions, possibly
changed and/or complemented by the corresponding (optional)
PREFUN
function. It must return the
(diss-)similarity value computed from the arguments.
x
and y
are two vectors from the
data matrix (matrices) supplied. If abcd
is FALSE
, it is
assumed that binary measures will be used, and the number of all
n
concordant and discordant pairs (x\_k, y\_k)
precomputed and supplied instead of x
and
y
. a
, b
, c
, and d
are the counts of
all (TRUE, TRUE), (TRUE, FALSE), (FALSE, TRUE), and (FALSE, FALSE)
pairs, respectively.
A function specified as PREFUN
parameter has mandatory arguments
x
, y
, p
, and reg_entry
, with y
and
p
possibly being NULL
depending on the task at
hand. x
and y
are the data objects, p
is a
(possibly empty) list with all specified proximity parameters, and
reg_entry
is the registry entry (a named list containing all
information specified in reg_add
).
The preprocessing function is allowed to change all these
information, and if so, is required to return *all* arguments
as a named list in the same order.
A function specified as POSTFUN
parameter has two mandatory
arguments: result
and p
. result
will contain the
computed raw data, i.e. a vector of length $n * (n - 1) / 2$ for
auto-distances (see dist
for details on
dist
objects), or a matrix for cross-distances. p
contains
the specified proximity parameters. Post-processing functions need to
return the result
object (even if unmodified).
A function specified as convert
parameter should preserve the
type of its argument.