Learn R Programming

recommenderlab (version 1.0.6)

dissimilarity: Dissimilarity and Similarity Calculation Between Rating Data

Description

Calculate dissimilarities/similarities between ratings by users and for items.

Usage

# S4 method for binaryRatingMatrix
dissimilarity(x, y = NULL, method = NULL, args = NULL, which = "users")
# S4 method for realRatingMatrix
dissimilarity(x, y = NULL, method = NULL, args = NULL, which = "users")

similarity(x, y = NULL, method = NULL, args = NULL, ...) # S4 method for ratingMatrix similarity(x, y = NULL, method = NULL, args = NULL, which = "users", min_matching = 0, min_predictive = 0)

Value

returns an object of class "dist", "simil"

or an appropriate object (e.g., a matrix with class "crossdist" o "crosssimil") to represent a cross-(dis)similarity.

Arguments

x

a ratingMatrix.

y

NULL or a second ratingMatrix to calculate cross-(dis)similarities.

method

(dis)similarity measure to use. Available measures are typically "cosine", "pearson", "jaccard", etc. See dissimilarity for class itemMatrix in arules for details about measures for binaryRatingMatrix and dist in proxy for realRatingMatrix. Default for realRatingMatrix is cosine and for binaryRatingMatrix is jaccard.

args

a list of additional arguments for the methods.

which

a character string indicating if the (dis)similarity should be calculated between "users" (rows) or "items" (columns).

min_matching, min_predictive

Thresholds on the minimum number of ratings used to calculate the similarity and the minimum number of ratings that can be used for prediction.

...

further arguments.

Details

Most dissimlarites and similarities are calculated using the proxy package. Similarities are typically converted into dissimilarities using \(s = 1 / (1 + d)\) or \(s = 1 - d\) (used for Jaccard, Cosine and Pearson correlation) depending on the measure.

Similarities are usually defined in the range of \([0, 1]\), however, Cosine similarity and Pearson correlation are defined in the interval \([-1, 1]\). We rescale these measures with \(s' = 1 / 2 (s + 1)\) to the interval \([0, 1]\).

Similarities are calculated using only the ratings that are available for both users/items. This can lead to calculating the measure using only a very small number (maybe only one) of ratings. min_matching is the required number of shared ratings to calculate similarities. To predict ratings, there need to be additional ratings in argument y. min_predictive is the required number of additional ratings to calculate similarities. If min_matching or min_predictive fails, then NA is reported instead of the calculated similarity.

See Also

ratingMatrix, dissimilarity in arules, and dist in proxy.

Examples

Run this code
data(MSWeb)

## between 5 users
dissimilarity(MSWeb[1:5,], method = "jaccard")
similarity(MSWeb[1:5,], method = "jaccard")

## between first 3 items
dissimilarity(MSWeb[,1:3], method = "jaccard", which = "items")
similarity(MSWeb[,1:3], method = "jaccard", which = "items")

## cross-similarity between first 2 users and users 10-20
similarity(MSWeb[1:2,], MSWeb[10:20,], method="jaccard")

Run the code above in your browser using DataLab