Learn R Programming

qlcMatrix (version 0.9.8)

distSparse: Sparse distance matrix calculations

Description

Sparse alternative to base dist function. WARNING: the result is not a distance metric, see details! Also: distances are calculated between columns (not between rows, as in the base dist function).

Usage

distSparse(M, method = "euclidean", diag = FALSE)

Value

A symmetric matrix of type dsCMatrix, consisting of similarity(!) values instead of distances (viz. max(dist)-dist).

Arguments

M

a sparse matrix in a format of the Matrix package, typically dMatrix. Any other matrices will be converted to such a sparse Matrix. The correlations will be calculated between the columns of this matrix (different from the base dist function!)

method

method to calculate distances. Currently only "euclidean" is supported.

diag

should the diagonal be included in the results?

Author

Michael Cysouw <cysouw@mac.com

Details

A sparse distance matrix is a slightly awkward concept, because distances of zero are rare in most data. Further, it is mostly the small distances that are of interest, and not the large distanes (which are mostly also less trustwhorthy). Note that for random data, this assumption is not necessarily true.

To obtain sparse results, the current implementation takes a special approach. First, only those distances will be calculated for which there is at least some non-zero data for both columns. The assumption is taken that those distances will be uninteresting (and relatively large anyway).

Second, to differentiate the non-calculated distances from real zero distances, the distances are converted into similarities by substracting them from the maximum. In this way, all non-calculated distances are zero, and the real zeros have value max(M).

Euclidean distances are calculated using the following trick: $$colSums(M^2) + rowSums(M^2) - 2 * M'M$$

See Also

See Also as dist.

Examples

Run this code
# to be done

Run the code above in your browser using DataLab