Skip to main content


Benchmark is a powerful tool for benchmarking machine learning models and evaluating their performance on specific tasks. You can obtain easy-to-interpret granular insights on the performance of models for quality, latency, footprint, cost, and bias with our interactive Benchmark reports.

The proposed benchmarking framework is designed to be modular and interoperable in its core across three main components, Models, Data (Datasets) and Metrics. You need to choose these three components for the task of your choice.

aiXplain significantly emphasizes the evaluation phase of AI models by providing a wide range of evaluation metrics that cater to many tasks and modalities. See the Metrics page for several examples.


We currently support the following tasks: text-generation, translation, speech-recogition.