Skip to main content

Build a conversational agent

Agents are intelligent systems designed to perform specific tasks by integrating multiple model tools. In this tutorial, you'll learn how to build a conversational agent that combines capabilities like speech synthesis, translation, and named entity recognition (NER). These functionalities enable the agent to understand, process, and respond to user inputs across different modalities and languages. By the end of this tutorial, you will be able to create a robust agent capable of handling diverse queries and providing contextual answers, enhancing your understanding of how to utilize aiXplain’s tools effectively. Let's get started!

Step 1: Set Up Your API Key

First, you'll need to generate an API key and set your team API key.

import os
os.environ["TEAM_API_KEY"] = "YOUR TEAM API KEY HERE"

Step 2: Create Model Tools

Model tools are the core components that power your agent's abilities. In this tutorial, we’ll create tools for:

  • Speech Synthesis: Converts text into audio.
  • Translation: Translates text into various languages.
  • Named Entity Recognition (NER): Identifies and categorizes key information within the text.

These model tools will form the foundation for your agent's functionality.

from aixplain.factories import AgentFactory
from aixplain.modules.agent import ModelTool
from aixplain.enums import Function, Supplier

speech_synthesis_tool = ModelTool(
function=Function.SPEECH_SYNTHESIS,
supplier=Supplier.GOOGLE
)

translation_tool = ModelTool(
function=Function.TRANSLATION,
)

ner_tool = ModelTool(
function=Function.NAMED_ENTITY_RECOGNITION,
)

Step 3: Create a Pipeline Tool

Next, let's create a pipeline tool to chain multiple operations together, allowing your agent to handle more complex tasks. Here's how you can create a pipeline tool by referencing a pipeline's unique ID. You can find more information on how to build a pipeline by following this guide.

from aixplain.modules.agent import PipelineTool

pipeline_tool = PipelineTool(
pipeline="PIPELINE ID",
description="PIPELINE DESCRIPTION HERE"
)

Step 4: Build the Agent

Now that we've created our tools, it's time to build the agent using the AgentFactory class. In this tutorial, we'll create an agent called "Wiki Agent" designed to answer questions using Wikipedia as its knowledge base. For more information about agents, please refer to this guide.

from aixplain.factories import AgentFactory

agent = AgentFactory.create(
name="Wiki Agent",
tools=[
speech_synthesis_tool,
ner_tool,
translation_tool,
pipeline_tool
],
description="Using Wikipedia to answer questions",
# required llm_id
llm_id="6646261c6eb563165658bbb1" # GPT 4o
)
agent.id

Step 5: Retrieve an Existing Agent

To interact with an already existing agent, you can retrieve its instance using its ID. Here's how to fetch the agent and explore its attributes:

agent = AgentFactory.get(agent.id)
agent.__dict__
Show output

Step 6: Invoke the Agent

Now it's time to put your agent to work! You can invoke the agent to perform various tasks, such as answering a question and providing an audio response.

agent_response1 = agent.run("What is the name of the driver who won Formula one championship in 2023? Answer in an English audio")
print(agent_response1)
Show output
agent_response1["data"]
Show output

To play the audio response, you can use the code snippet below:

import requests
import re
from IPython.display import Audio, display

def display_audio(agent_response):
pattern = r"https://[^\s/$.?#].[^\s]*"
sound_file = re.findall(pattern, agent_response["data"]["output"])[0].replace(").", "").replace(")","")
print(sound_file)
response = requests.get(sound_file)

# Check if the request was successful
if response.status_code == 200:
# Open a file in binary write mode
with open('downloaded_file.mp3', 'wb') as file:
# Write the content of the response to the file
file.write(response.content)
display(Audio('downloaded_file.mp3', autoplay=True))
os.remove('downloaded_file.mp3')
display_audio(agent_response1)

Step 7: Use Agent Memory for Follow-up Queries

Your agent supports short-term memory, which allows it to keep track of context across multiple interactions. Here’s how to use the session_id for follow-up questions:

session_id = agent_response1["data"]["session_id"]
print(f"Session id: {session_id}")

agent_response2 = agent.run("Extract the personal information about that driver.", session_id=session_id)
print("\nResponse:")
print(agent_response2)
Show output
agent_response2["data"]["output"]
Show output

You can continue asking questions in the same session to maintain context.

agent_response3 = agent.run("What about in 1991? I want the answer in German text this time", session_id=session_id)
print("\nResponse:")
print(agent_response3)
Show output
agent_response4 = agent.run("get me his personal info", session_id=session_id)
print("\nResponse:")
print(agent_response4)
Show output
response = agent.run("Who is the Brazilian athlete who most won Olympic medals for their country? Look for their personal info as well.")
print(response)
Show output

You've successfully built and interacted with a conversational agent using various tools and memory functionalities. Feel free to explore additional features and customize the agent to your needs!