ls_fit_sum_of_ultrametrics: Least Squares Fit of Sums of Ultrametrics to Dissimilarities

Description

Find a sequence of ultrametrics with sum minimizing square distance (Euclidean dissimilarity) to a given dissimilarity object.

Usage

ls_fit_sum_of_ultrametrics(x, nterms = 1, weights = 1,
                           control = list())

Value

A list of objects of class "cl_ultrametric" containing the fitted ultrametric distances.

Arguments

x: a dissimilarity object inheriting from or coercible to class "dist".
nterms: an integer giving the number of ultrametrics to be fitted.
weights: a numeric vector or matrix with non-negative weights for obtaining a weighted least squares fit. If a matrix, its numbers of rows and columns must be the same as the number of objects in x, and the lower diagonal part is used. Otherwise, it is recycled to the number of elements in x.
control: a list of control parameters. See Details.

Details

The problem to be solved is minimizing the criterion function $$L(u(1), \dots, u(n)) = \sum_{i,j} w_{ij} \left(x_{ij} - \sum_{k=1}^n u_{ij}(k)\right)^2$$ over all $u(1), \ldots, u(n)$ satisfying the ultrametric constraints.

We provide an implementation of the iterative heuristic suggested in Carroll & Pruzansky (1980) which in each step $t$ sequentially refits the $u(k)$ as the least squares ultrametric fit to the “residuals” $x - \sum_{l \ne k} u(l)$ using ls_fit_ultrametric.

Available control parameters include

maxiter: an integer giving the maximal number of iteration steps to be performed. Defaults to 100.
eps: a nonnegative number controlling the iteration, which stops when the maximal change in all $u(k)$ is less than eps. Defaults to $10^{-6}$.
reltol: the relative convergence tolerance. Iteration stops when the relative change in the criterion function is less than reltol. Defaults to $10^{-6}$.
method: a character string indicating the fitting method to be employed by the individual least squares fits.
control: a list of control parameters to be used by the method of ls_fit_ultrametric employed. By default, if the SUMT method method is used, 10 inner SUMT runs are performed for each refitting.

It should be noted that the method used is a heuristic which can not be guaranteed to find the global minimum.

References

J. D. Carroll and S. Pruzansky (1980). Discrete and hybrid scaling models. In E. D. Lantermann and H. Feger (eds.), Similarity and Choice. Bern (Switzerland): Huber.