Learn R Programming

superml (version 0.5.7)

bm25: Best Matching(BM25) - Deprecated

Description

Computer BM25 distance between sentences/documents.

Arguments

Public fields

corpus

a list containing sentences

use_parallel

enables parallel computation, defaults to FALSE

Methods


Method new()

Usage

bm25$new(corpus, use_parallel)

Arguments

corpus

list, a list containing sentences

use_parallel

logical, enables parallel computation, defaults to FALSE. if TRUE uses n - 1 cores.

Details

Create a new `bm25` object.

Returns

A `bm25` object.

example <- c('white audi 2.5 car','black shoes from office', 'new mobile iphone 7','audi tyres audi a3', 'nice audi bmw toyota corolla') obj <- bm25$new(example, use_parallel=FALSE)


Method most_similar()

Usage

bm25$most_similar(document, topn = 1)

Arguments

document

character, for this value we find most similar sentences.

topn

integer, top n sentences to retrieve

Details

Returns a list of the most similar sentence

Returns

a vector of most similar documents

example <- c('white audi 2.5 car','black shoes from office', 'new mobile iphone 7','audi tyres audi a3', 'nice audi bmw toyota corolla') get_bm <- bm25$new(example, use_parallel=FALSE) input_document <- c('white toyota corolla') get_bm$most_similar(document = input_document, topn = 2)


Method clone()

The objects of this class are cloneable with this method.

Usage

bm25$clone(deep = FALSE)

Arguments

deep

Whether to make a deep clone.

Details

BM25 stands for Best Matching 25. It is widely using for ranking documents and a preferred method than TF*IDF scores. It is used to find the similar documents from a corpus, given a new document. It is popularly used in information retrieval systems. This implementation uses multiple cores for faster and parallel computation.