bleu_corpus_ids: Computes BLEU score (Papineni et al., 2002).

Description

'bleu_sentence_ids' computes the BLEU score for a corpus and its respective reference sentences. The sentences must be tokenized before so they are represented as integer vectors. Akin to 'sacrebleu' ('Python'), the function allows the application of different smoothing methods. Epsilon- and add-k-smoothing are available. Epsilon-smoothing is equivalent to 'floor' smoothing in the sacreBLEU implementation. The different smoothing techniques are described in Chen et al., 2014 (https://aclanthology.org/W14-3346/).

Usage

bleu_corpus_ids(
  references,
  candidates,
  n = 4,
  weights = NULL,
  smoothing = NULL,
  epsilon = 0.1,
  k = 1
)

Value

The BLEU score for the candidate sentence.

Arguments

references: A list of a list of reference sentences (`list(list(c(1,2,...)), list(c(3,5,...)))`).
candidates: A list of candidate sentences (`list(c(1,2,...), c(3,5,...))`).
n: N-gram for BLEU score (default is set to 4).
weights: Weights for the n-grams (default is set to 1/n for each entry).
smoothing: Smoothing method for BLEU score (default is set to 'standard', 'floor', 'add-k' available)
epsilon: Epsilon value for epsilon-smoothing (default is set to 0.1).
k: K value for add-k-smoothing (default is set to 1).

Examples

Run this code

cand_corpus <- list(c(1,2,3), c(1,2))
ref_corpus <- list(list(c(1,2,3), c(2,3,4)), list(c(1,2,6), c(781, 21, 9), c(7, 3)))
bleu_corpus_ids_standard <- bleu_corpus_ids(ref_corpus, cand_corpus)
bleu_corpus_ids_floor <- bleu_corpus_ids(ref_corpus, cand_corpus, smoothing="floor", epsilon=0.01)
bleu_corpus_ids_add_k <- bleu_corpus_ids(ref_corpus, cand_corpus, smoothing="add-k", k=1)

Run the code above in your browser using DataLab