Skip to main content
Version: 1.0

aixplain.modules.benchmark_job

BenchmarkJob Objects

class BenchmarkJob()

[view_source]

Benchmark Job Represents a single run of an already created Benchmark.

Attributes:

  • id str - ID of the Benchmark Job.
  • status str - Status of the Benchmark Job.
  • benchmark_id str - ID of the associated parent Benchmark.
  • additional_info dict - Any additional information to be saved with the Benchmark Job.

__init__

def __init__(id: Text, status: Text, benchmark_id: Text,
**additional_info) -> None

[view_source]

Create a Benchmark Job with the necessary information. Each Job is a run of a parent Benchmark

Arguments:

  • id Text - ID of the Benchmark Job
  • status Text - Status of the Benchmark Job
  • benchmark_id Text - ID of the associated parent Benchmark
  • **additional_info - Any additional Benchmark Job info to be saved

check_status

def check_status() -> Text

[view_source]

Check the current status of the benchmark job.

Fetches the latest status from the API and updates the local state.

Returns:

  • Text - The current status of the benchmark job.

download_results_as_csv

def download_results_as_csv(save_path: Optional[Text] = None,
return_dataframe: bool = False)

[view_source]

Get the results of the benchmark job in a CSV format. The results can either be downloaded locally or returned in the form of pandas.DataFrame.

Arguments:

  • save_path Text, optional - Path to save the CSV if return_dataframe is False. If None, a ranmdom path is generated. defaults to None.
  • return_dataframe bool - If True, the result is returned as pandas.DataFrame else saved as a CSV file. defaults to False.

Returns:

  • str/pandas.DataFrame - results as path of locally saved file if return_dataframe is False else as a pandas dataframe

get_scores

def get_scores(
return_simplified: bool = True,
return_as_dataframe: bool = True) -> Union[Dict, pd.DataFrame, list]

[view_source]

Get the benchmark scores for all models.

Arguments:

  • return_simplified bool, optional - If True, returns a simplified version of scores. Defaults to True.
  • return_as_dataframe bool, optional - If True and return_simplified is True, returns results as a pandas DataFrame. Defaults to True.

Returns:

Union[Dict, pd.DataFrame, list]: The benchmark scores in the requested format.

  • If return_simplified=False: Returns a dictionary with detailed model scores
  • If return_simplified=True and return_as_dataframe=True: Returns a pandas DataFrame
  • If return_simplified=True and return_as_dataframe=False: Returns a list of dictionaries

Raises:

  • Exception - If there's an error fetching or processing the scores.

get_failuire_rate

def get_failuire_rate(
return_as_dataframe: bool = True) -> Union[Dict, pd.DataFrame]

[view_source]

Calculate the failure rate for each model in the benchmark.

Arguments:

  • return_as_dataframe bool, optional - If True, returns results as a pandas DataFrame. Defaults to True.

Returns:

Union[Dict, pd.DataFrame]: The failure rates for each model.

  • If return_as_dataframe=True: Returns a DataFrame with 'Model' and 'Failure Rate' columns
  • If return_as_dataframe=False: Returns a dictionary with model IDs as keys and failure rates as values

Raises:

  • Exception - If there's an error calculating the failure rates.

get_all_explanations

def get_all_explanations() -> Dict

[view_source]

Get all explanations for the benchmark results.

Returns:

  • Dict - A dictionary containing both metric-dependent and metric-independent explanations. The dictionary has two keys:
    • 'metricInDependent': List of metric-independent explanations
    • 'metricDependent': List of metric-dependent explanations

Raises:

  • Exception - If there's an error fetching the explanations.

get_localized_explanations

def get_localized_explanations(metric_dependant: bool,
group_by_task: bool = False) -> Dict

[view_source]

Get localized explanations for the benchmark results.

Arguments:

  • metric_dependant bool - If True, returns metric-dependent explanations. If False, returns metric-independent explanations.
  • group_by_task bool, optional - If True and metric_dependant is True, groups explanations by task. Defaults to False.

Returns:

  • Dict - A dictionary containing the localized explanations. The structure depends on the input parameters:
    • If metric_dependant=False: Returns metric-independent explanations
    • If metric_dependant=True and group_by_task=False: Returns explanations grouped by score ID
    • If metric_dependant=True and group_by_task=True: Returns explanations grouped by task

Raises:

  • Exception - If there's an error fetching or processing the explanations.