aixplain.modules.benchmark_job
BenchmarkJob Objects
class BenchmarkJob()
Benchmark Job Represents a single run of an already created Benchmark.
Attributes:
id
str - ID of the Benchmark Job.status
str - Status of the Benchmark Job.benchmark_id
str - ID of the associated parent Benchmark.additional_info
dict - Any additional information to be saved with the Benchmark Job.
__init__
def __init__(id: Text, status: Text, benchmark_id: Text,
**additional_info) -> None
Create a Benchmark Job with the necessary information. Each Job is a run of a parent Benchmark
Arguments:
id
Text - ID of the Benchmark Jobstatus
Text - Status of the Benchmark Jobbenchmark_id
Text - ID of the associated parent Benchmark**additional_info
- Any additional Benchmark Job info to be saved
check_status
def check_status() -> Text
Check the current status of the benchmark job.
Fetches the latest status from the API and updates the local state.
Returns:
Text
- The current status of the benchmark job.
download_results_as_csv
def download_results_as_csv(save_path: Optional[Text] = None,
return_dataframe: bool = False)
Get the results of the benchmark job in a CSV format. The results can either be downloaded locally or returned in the form of pandas.DataFrame.
Arguments:
save_path
Text, optional - Path to save the CSV if return_dataframe is False. If None, a ranmdom path is generated. defaults to None.return_dataframe
bool - If True, the result is returned as pandas.DataFrame else saved as a CSV file. defaults to False.
Returns:
str/pandas.DataFrame
- results as path of locally saved file if return_dataframe is False else as a pandas dataframe
get_scores
def get_scores(
return_simplified: bool = True,
return_as_dataframe: bool = True) -> Union[Dict, pd.DataFrame, list]
Get the benchmark scores for all models.
Arguments:
return_simplified
bool, optional - If True, returns a simplified version of scores. Defaults to True.return_as_dataframe
bool, optional - If True and return_simplified is True, returns results as a pandas DataFrame. Defaults to True.
Returns:
Union[Dict, pd.DataFrame, list]: The benchmark scores in the requested format.
- If return_simplified=False: Returns a dictionary with detailed model scores
- If return_simplified=True and return_as_dataframe=True: Returns a pandas DataFrame
- If return_simplified=True and return_as_dataframe=False: Returns a list of dictionaries
Raises:
Exception
- If there's an error fetching or processing the scores.
get_failuire_rate
def get_failuire_rate(
return_as_dataframe: bool = True) -> Union[Dict, pd.DataFrame]
Calculate the failure rate for each model in the benchmark.
Arguments:
return_as_dataframe
bool, optional - If True, returns results as a pandas DataFrame. Defaults to True.
Returns:
Union[Dict, pd.DataFrame]: The failure rates for each model.
- If return_as_dataframe=True: Returns a DataFrame with 'Model' and 'Failure Rate' columns
- If return_as_dataframe=False: Returns a dictionary with model IDs as keys and failure rates as values
Raises:
Exception
- If there's an error calculating the failure rates.
get_all_explanations
def get_all_explanations() -> Dict
Get all explanations for the benchmark results.
Returns:
Dict
- A dictionary containing both metric-dependent and metric-independent explanations. The dictionary has two keys:- 'metricInDependent': List of metric-independent explanations
- 'metricDependent': List of metric-dependent explanations
Raises:
Exception
- If there's an error fetching the explanations.
get_localized_explanations
def get_localized_explanations(metric_dependant: bool,
group_by_task: bool = False) -> Dict
Get localized explanations for the benchmark results.
Arguments:
metric_dependant
bool - If True, returns metric-dependent explanations. If False, returns metric-independent explanations.group_by_task
bool, optional - If True and metric_dependant is True, groups explanations by task. Defaults to False.
Returns:
Dict
- A dictionary containing the localized explanations. The structure depends on the input parameters:- If metric_dependant=False: Returns metric-independent explanations
- If metric_dependant=True and group_by_task=False: Returns explanations grouped by score ID
- If metric_dependant=True and group_by_task=True: Returns explanations grouped by task
Raises:
Exception
- If there's an error fetching or processing the explanations.