'cmp.similarity' uses descriptor information generated by 'cmp.parse'
and 'cmp.parse1'. Basically, a descriptor is a vector of numbers. The
vector actually reprsents the set of descriptors of structural
fragment. Similarity measurement uses Tanimoto coefficient. 'cmp.similarity' supports 3 different modes. In mode 1, normal Tanimoto
coefficient is used. In mode 2, it uses the size of descriptor
intersection over the size of the smaller descriptor, mainly to deal
with compounds that vary a lot in size. In mode 3, it is similar to
mode 2, except that it raises the similarity to the power 3 to penalize
small values. When mode is 0, 'cmp.similarity' will select mode 1 or
mode 3, based on the size differences between the two descriptors.
When 'cmp.similarity' is used in searching compounds with a threshold
similarity value, or in clustering with a cutoff distance, the
threshold similarity and cutoff distance can be used to decide a
'worse' value. 'cmp.similarity' can compute an upper bound of
similarity easier, and by comparing this upper bound to the 'worst'
value, it can potentially skip the real computation if it finds the
similarity will be below the 'worst' value and will be useless to the
caller.