Skip to main content

Metric Nodes

Metrics provide a quantitative way to evaluate AI model outputs, ensuring you can measure performance against a reference or desired outcome. By adding a metric node to your pipeline, you can automatically calculate scores such as BLEU, ROUGE, or accuracy.

Docusaurus themed imageDocusaurus themed image

We have reference similarity metrics, human evaluation estimation metrics, and referenceless metrics. We provide a wide range of evaluation metrics, catering to many tasks and modalities. Below are some examples.

Text Generation

  • BLEU (Papineni et al., 2002)
  • WER (Woodard and Nelson)
  • chrF (Popovic´, 2015)
  • Comet DA (Reiet al., 2020)
  • Nisqa (Mittag et al., 2021)
  • Comet QE (Reiet al., 2021)

Speech Recognition

  • WIL, MER (Morris et al., 2004)

Machine Translation

  • TER (Snover et al., 2006)
  • METEOR (Banerjee and Lavie, 2005)

Speech Synthesis

  • PESQ (Rix et al., 2001)
  • DNSMOS (Reddy et al., 2021)

This example illustrates how to create a pipeline that:

  1. Accepts two inputs: one for the hypothesis (model output) and one for the reference.
  2. Uses two metric nodes to evaluate the same hypothesis/reference with different metrics (for example, BLEU and CHRF).
  3. Ends with output nodes for each metric’s score.

Create the Pipeline

This example pipeline consists of:

  • 2 input nodes (for hypothesis and reference text).
  • 2 metric nodes (two different metrics).
  • 2 output nodes (one for each metric score).
from aixplain.factories import PipelineFactory
from aixplain.enums import DataType

# Initialize a pipeline
pipeline = PipelineFactory.init("Metrics Pipeline Example")

# 2 Input Nodes (hypothesis and reference)
hypothesis_input = pipeline.input(data_types=[DataType.TEXT]) # model output
reference_input = pipeline.input(data_types=[DataType.TEXT]) # ground truth

# 2 Metric Nodes (e.g., BLEU and CHRF)
metric_node_bleu = pipeline.metric(asset_id="YOUR_BLEU_ASSET_ID")
metric_node_chrf = pipeline.metric(asset_id="YOUR_CHRF_ASSET_ID")

# Link Inputs to Metrics
# Typically, metric nodes expect "hypotheses" and "references" params
hypothesis_input.link(metric_node_bleu, 'input', 'hypotheses')
reference_input.link(metric_node_bleu, 'input', 'references')

hypothesis_input.link(metric_node_chrf, 'input', 'hypotheses')
reference_input.link(metric_node_chrf, 'input', 'references')

# Output Nodes for Each Score
bleu_output = pipeline.output()
chrf_output = pipeline.output()

# Link the score param to each output node
metric_node_bleu.link(bleu_output, 'score', 'output')
metric_node_chrf.link(chrf_output, 'score', 'output')

# Save & Run
pipeline.save(save_as_asset=True)
print("Pipeline saved successfully!")

# Provide hypothesis & reference text
result = pipeline.run({
"hypothesis_input": "Hello world",
"reference_input": "Hello world!"
}, version="3.0")

print(result)

Guide to Building the Metrics Example Pipeline

Initialize the Pipeline

from aixplain.factories import PipelineFactory
from aixplain.enums import DataType

# Initialize a pipeline
pipeline = PipelineFactory.init("Metrics Pipeline Example")

Add Input Nodes

We define two input nodes, one for the hypothesis and one for the reference:

hypothesis_input = pipeline.input(data_types=[DataType.TEXT])
reference_input = pipeline.input(data_types=[DataType.TEXT])
  • hypothesis_input: Accepts the model output text.
  • reference_input: Accepts the ground truth text.

Add Metric Nodes

Each metric node uses a specific metric asset. Here we assume two different metrics:

metric_node_bleu = pipeline.metric(asset_id="YOUR_BLEU_ASSET_ID")
metric_node_chrf = pipeline.metric(asset_id="YOUR_CHRF_ASSET_ID")

Each metric node typically requires:

  • hypotheses: The model’s output or candidate translation.
  • references: The ground truth or reference text.
# Link hypothesis & reference to BLEU
hypothesis_input.link(metric_node_bleu, 'input', 'hypotheses')
reference_input.link(metric_node_bleu, 'input', 'references')

# Link hypothesis & reference to CHRF
hypothesis_input.link(metric_node_chrf, 'input', 'hypotheses')
reference_input.link(metric_node_chrf, 'input', 'references')

Output Nodes

To capture the final scores:

bleu_output = pipeline.output()
chrf_output = pipeline.output()

metric_node_bleu.link(bleu_output, 'score', 'output')
metric_node_chrf.link(chrf_output, 'score', 'output')

This ensures both metric scores are returned when the pipeline runs.

Validate the Pipeline

Check that:

  1. The pipeline has input nodes for hypotheses and references.
  2. Each metric node is linked correctly to these inputs.
  3. Each metric node outputs a score linked to an output node.

Then, save the pipeline. Any linking issues or param mismatches will raise an error.

pipeline.save()

Run the Pipeline

Pass both inputs as a dictionary:

result = pipeline.run({
"hypothesis_input": "Machine translation output",
"reference_input": "Reference text"
}, version="3.0")
print(result)

You’ll see two final outputs, each containing the metric score.

Using metric nodes allows you to seamlessly integrate performance evaluation into your pipelines. This provides a comprehensive, automated workflow that covers both AI generation and robust evaluation.