Version: 1.0

aixplain.modules.benchmark_job

BenchmarkJob Objects

class BenchmarkJob()

[view_source]

Benchmark Job Represents a single run of an already created Benchmark.

Attributes:

id str - ID of the Benchmark Job.
status str - Status of the Benchmark Job.
benchmark_id str - ID of the associated parent Benchmark.
additional_info dict - Any additional information to be saved with the Benchmark Job.

init

def __init__(id: Text, status: Text, benchmark_id: Text,
             **additional_info) -> None

[view_source]

Create a Benchmark Job with the necessary information. Each Job is a run of a parent Benchmark

Arguments:

id Text - ID of the Benchmark Job
status Text - Status of the Benchmark Job
benchmark_id Text - ID of the associated parent Benchmark
**additional_info - Any additional Benchmark Job info to be saved

check_status

def check_status() -> Text

[view_source]

Check the current status of the benchmark job.

Fetches the latest status from the API and updates the local state.

Returns:

Text - The current status of the benchmark job.

download_results_as_csv

def download_results_as_csv(save_path: Optional[Text] = None,
                            return_dataframe: bool = False)

[view_source]

Get the results of the benchmark job in a CSV format. The results can either be downloaded locally or returned in the form of pandas.DataFrame.

Arguments:

save_path Text, optional - Path to save the CSV if return_dataframe is False. If None, a ranmdom path is generated. defaults to None.
return_dataframe bool - If True, the result is returned as pandas.DataFrame else saved as a CSV file. defaults to False.

Returns:

str/pandas.DataFrame - results as path of locally saved file if return_dataframe is False else as a pandas dataframe

get_scores

def get_scores(
        return_simplified: bool = True,
        return_as_dataframe: bool = True) -> Union[Dict, pd.DataFrame, list]

[view_source]

Get the benchmark scores for all models.

Arguments:

return_simplified bool, optional - If True, returns a simplified version of scores. Defaults to True.
return_as_dataframe bool, optional - If True and return_simplified is True, returns results as a pandas DataFrame. Defaults to True.

Returns:

Union[Dict, pd.DataFrame, list]: The benchmark scores in the requested format.

If return_simplified=False: Returns a dictionary with detailed model scores
If return_simplified=True and return_as_dataframe=True: Returns a pandas DataFrame
If return_simplified=True and return_as_dataframe=False: Returns a list of dictionaries

Raises:

Exception - If there's an error fetching or processing the scores.

get_failuire_rate

def get_failuire_rate(
        return_as_dataframe: bool = True) -> Union[Dict, pd.DataFrame]

[view_source]

Calculate the failure rate for each model in the benchmark.

Arguments:

return_as_dataframe bool, optional - If True, returns results as a pandas DataFrame. Defaults to True.

Returns:

Union[Dict, pd.DataFrame]: The failure rates for each model.

If return_as_dataframe=True: Returns a DataFrame with 'Model' and 'Failure Rate' columns
If return_as_dataframe=False: Returns a dictionary with model IDs as keys and failure rates as values

Raises:

Exception - If there's an error calculating the failure rates.

get_all_explanations

def get_all_explanations() -> Dict

[view_source]

Get all explanations for the benchmark results.

Returns:

Dict - A dictionary containing both metric-dependent and metric-independent explanations. The dictionary has two keys:
- 'metricInDependent': List of metric-independent explanations
- 'metricDependent': List of metric-dependent explanations

Raises:

Exception - If there's an error fetching the explanations.

get_localized_explanations

def get_localized_explanations(metric_dependant: bool,
                               group_by_task: bool = False) -> Dict

[view_source]

Get localized explanations for the benchmark results.

Arguments:

metric_dependant bool - If True, returns metric-dependent explanations. If False, returns metric-independent explanations.
group_by_task bool, optional - If True and metric_dependant is True, groups explanations by task. Defaults to False.

Returns:

Dict - A dictionary containing the localized explanations. The structure depends on the input parameters:
- If metric_dependant=False: Returns metric-independent explanations
- If metric_dependant=True and group_by_task=False: Returns explanations grouped by score ID
- If metric_dependant=True and group_by_task=True: Returns explanations grouped by task

Raises:

Exception - If there's an error fetching or processing the explanations.

BenchmarkJob Objects​

__init__​

check_status​

download_results_as_csv​

get_scores​

get_failuire_rate​

get_all_explanations​

get_localized_explanations​

BenchmarkJob Objects

init

check_status

download_results_as_csv

get_scores

get_failuire_rate

get_all_explanations

get_localized_explanations