Skip to main content


aiXplain has an impressive library of metrics for various machine learning tasks like Translation, Speech Recognition, Diacritization, and Sentiment Analysis. Metrics can be used in Benchmark and Design (via metric nodes).

Docusaurus themed imageDocusaurus themed image

We have reference similarity metrics, human evaluation estimation metrics, and referenceless metrics. We provide a wide range of evaluation metrics, catering to many tasks and modalities. Below are some examples.

Text Generation

  • BLEU (Papineni et al., 2002)
  • WER (Woodard and Nelson)
  • chrF (Popovic´, 2015)
  • Comet DA (Reiet al., 2020)
  • Nisqa (Mittag et al., 2021)
  • Comet QE (Reiet al., 2021)

Speech Recognition

  • WIL, MER (Morris et al., 2004)

Machine Translation

  • TER (Snover et al., 2006)
  • METEOR (Banerjee and Lavie, 2005)

Speech Synthesis

  • PESQ (Rix et al., 2001)
  • DNSMOS (Reddy et al., 2021)