Learn R Programming

superml (version 0.4.0)

bm25: Best Matching(BM25)

Description

BM25 stands for Best Matching 25. It is widely using for ranking documents and a preferred method than TF*IDF scores. It is used to find the similar documents from a corpus, given a new document. It is popularly used in information retrieval systems. This implementation uses multiple cores for faster and parallel computation.

Usage

bm25

Format

R6Class object.

Usage

For usage details see Methods, Arguments and Examples sections.

bm25 = bm25$new(corpus, n_cores)
bm25$most_similar(input_document, topn)
bm25$compute(input_document)

Methods

$new()

Initialise the instance of the class. Here you pass the complete corpus of the documents

$most_similar()

it returns the topn most similar documents from the corpus

$compute()

it returns a similarity score for all the documents in the corpus, given a sentence

Arguments

corpus

a list containing sentences

use_parallel

boolean value used to activate parallel computation, defaults to FALSE

Examples

Run this code
# NOT RUN {
example <- c('white audi 2.5 car','black shoes from office',
             'new mobile iphone 7','audi tyres audi a3',
             'nice audi bmw toyota corolla')
get_bm <- bm25$new(example, use_parallel=FALSE)
input_document <- c('white toyota corolla')
get_bm$most_similar(document = input_document, topn = 2)
# }

Run the code above in your browser using DataLab