Computer BM25 distance between sentences/documents.
corpus
a list containing sentences
use_parallel
enables parallel computation, defaults to FALSE
new()
bm25$new(corpus, use_parallel)
corpus
list, a list containing sentences
use_parallel
logical, enables parallel computation, defaults to FALSE. if TRUE uses n - 1 cores.
Create a new `bm25` object.
A `bm25` object.
example <- c('white audi 2.5 car','black shoes from office', 'new mobile iphone 7','audi tyres audi a3', 'nice audi bmw toyota corolla') obj <- bm25$new(example, use_parallel=FALSE)
most_similar()
bm25$most_similar(document, topn = 1)
document
character, for this value we find most similar sentences.
topn
integer, top n sentences to retrieve
Returns a list of the most similar sentence
a vector of most similar documents
example <- c('white audi 2.5 car','black shoes from office', 'new mobile iphone 7','audi tyres audi a3', 'nice audi bmw toyota corolla') get_bm <- bm25$new(example, use_parallel=FALSE) input_document <- c('white toyota corolla') get_bm$most_similar(document = input_document, topn = 2)
clone()
The objects of this class are cloneable with this method.
bm25$clone(deep = FALSE)
deep
Whether to make a deep clone.
BM25 stands for Best Matching 25. It is widely using for ranking documents and a preferred method than TF*IDF scores. It is used to find the similar documents from a corpus, given a new document. It is popularly used in information retrieval systems. This implementation uses multiple cores for faster and parallel computation.