Learn R Programming

recommenderlab (version 0.2-6)

dissimilarity: Dissimilarity and Similarity Calculation Between Rating Data

Description

Calculate dissimilarities/similarities between ratings by users and for items.

Usage

# S4 method for binaryRatingMatrix
dissimilarity(x, y = NULL, method = NULL, args = NULL, which="users")
# S4 method for realRatingMatrix
dissimilarity(x, y = NULL, method = NULL, args = NULL, which="users")

similarity(x, y = NULL, method = NULL, args = NULL, ...) # S4 method for ratingMatrix similarity(x, y = NULL, method = NULL, args = NULL, which="users", min_matching = 0, min_predictive = 0)

Arguments

x

a ratingMatrix.

y

NULL or a second ratingMatrix to calculate cross-(dis)similarities.

method

(dis)similarity measure to use. Available measures are typically "cosine", "pearson", "jaccard", etc. See dissimilarity for class itemMatrix in arules for details about measures for binaryRatingMatrix and dist in proxy for realRatingMatrix. Default for realRatingMatrix is cosine and for binaryRatingMatrix is jaccard.

args

a list of additional arguments for the methods.

which

a character string indicating if the (dis)similarity should be calculated between "users" (rows) or "items" (columns).

min_matching, min_predictive

Thresholds on the minimum number of ratings used to calculate the similarity and the minimum number of ratings that can be used for prediction.

...

further arguments.

Value

returns an object of class dist, simil or an appropriate object (e.g., a matrix) to represent a cross-(dis)similarity.

Details

Similarities are computed from dissimilarities using \(s=1/(1+d)\) or \(s=1-d\) depending on the measure. For Pearson we use 1 - positive correlation.

Similarities are calculated using only the ratings that are available for both users/items. This can lead to calculating the measure using only a very small number (maybe only one) of ratings. min_matching is the required number of shared ratings to calculate similarities. To predict ratings, there need to be additional ratings in argument y. min_predictive is the required number of additional ratings to calculate similarities. If min_matching or min_predictive fails, then NA is reported instead of the calculated similarity.

See Also

'>ratingMatrix and dissimilarity in arules.

Examples

Run this code
# NOT RUN {
data(MSWeb)

## between 5 users
dissimilarity(MSWeb[1:5,], method = "jaccard")
similarity(MSWeb[1:5,], method = "jaccard")

## between first 3 items
dissimilarity(MSWeb[,1:3], method = "jaccard", which = "items")
similarity(MSWeb[,1:3], method = "jaccard", which = "items")

## cross-similarity between first 2 users and users 10-20
similarity(MSWeb[1:2,], MSWeb[10:20,], method="jaccard")
# }

Run the code above in your browser using DataLab