Learn R Programming

proxyC (version 0.1.0)

simil: Compute similiarty/distance between raws or columns of large matrices

Description

Fast similarity/distance computation function for large sparse matrices. You can floor small similairty value to to save computation time and storage space by an arbitrary threashold (min_simil) or rank (rank). Please increase the numbner of threads for better perfromance using setThreadOptions.

Usage

simil(x, y = NULL, margin = 1, method = c("cosine", "correlation",
  "jaccard", "ejaccard", "dice", "edice", "hamman", "simple matching",
  "faith"), min_simil = NULL, rank = NULL)

dist(x, y = NULL, margin = 1, method = c("euclidean", "chisquared", "hamming", "kullback", "manhattan", "maximum", "canberra", "minkowski"), p = 2)

Arguments

x

a matrix or Matrix object

y

if a matrix or Matrix object is provided, proximity between documents or features in x and y is computed.

margin

integer indicating margin of similiarty/distance computation. 1 indicates rows or 2 indicates columns.

method

method to compute similarity or distance

min_simil

the minimum similiarty value to be recoded.

rank

an integer value specifying top-n most similiarty values to be recorded.

p

weight for minkowski distance

Examples

Run this code
# NOT RUN {
mt <- Matrix::rsparsematrix(100, 100, 0.01)
simil(mt, method = "cosine")[1:5, 1:5]
mt <- Matrix::rsparsematrix(100, 100, 0.01)
dist(mt, method = "euclidean")[1:5, 1:5]
# }

Run the code above in your browser using DataLab