Build a conversational agent
Agents are intelligent systems designed to perform specific tasks by integrating multiple model tools. In this tutorial, you'll learn how to build a conversational agent that combines capabilities like speech synthesis, translation, and named entity recognition (NER). These functionalities enable the agent to understand, process, and respond to user inputs across different modalities and languages. By the end of this tutorial, you will be able to create a robust agent capable of handling diverse queries and providing contextual answers, enhancing your understanding of how to utilize aiXplain’s tools effectively. Let's get started!
Step 1: Set Up Your API Key
First, you'll need to generate an API key and set your team API key.
import os
os.environ["TEAM_API_KEY"] = "YOUR TEAM API KEY HERE"
Step 2: Create Model Tools
Model tools are the core components that power your agent's abilities. In this tutorial, we’ll create tools for:
- Speech Synthesis: Converts text into audio.
- Translation: Translates text into various languages.
- Named Entity Recognition (NER): Identifies and categorizes key information within the text.
These model tools will form the foundation for your agent's functionality.
from aixplain.factories import AgentFactory
from aixplain.modules.agent import ModelTool
from aixplain.enums import Function, Supplier
speech_synthesis_tool = ModelTool(
function=Function.SPEECH_SYNTHESIS,
supplier=Supplier.GOOGLE
)
translation_tool = ModelTool(
function=Function.TRANSLATION,
)
ner_tool = ModelTool(
function=Function.NAMED_ENTITY_RECOGNITION,
)
Step 3: Create a Pipeline Tool
Next, let's create a pipeline tool to chain multiple operations together, allowing your agent to handle more complex tasks. Here's how you can create a pipeline tool by referencing a pipeline's unique ID. You can find more information on how to build a pipeline by following this guide.
from aixplain.modules.agent import PipelineTool
pipeline_tool = PipelineTool(
pipeline="PIPELINE ID",
description="PIPELINE DESCRIPTION HERE"
)
Step 4: Build the Agent
Now that we've created our tools, it's time to build the agent using the AgentFactory
class. In this tutorial, we'll create an agent called "Wiki Agent" designed to answer questions using Wikipedia as its knowledge base. For more information about agents, please refer to this guide.
from aixplain.factories import AgentFactory
agent = AgentFactory.create(
name="Wiki Agent",
tools=[
speech_synthesis_tool,
ner_tool,
translation_tool,
pipeline_tool
],
description="Using Wikipedia to answer questions",
# required llm_id
llm_id="6646261c6eb563165658bbb1" # GPT 4o
)
agent.id
Step 5: Retrieve an Existing Agent
To interact with an already existing agent, you can retrieve its instance using its ID. Here's how to fetch the agent and explore its attributes:
agent = AgentFactory.get(agent.id)
agent.__dict__
Step 6: Invoke the Agent
Now it's time to put your agent to work! You can invoke the agent to perform various tasks, such as answering a question and providing an audio response.
agent_response1 = agent.run("What is the name of the driver who won Formula one championship in 2023? Answer in an English audio")
print(agent_response1)
agent_response1["data"]
To play the audio response, you can use the code snippet below:
import requests
import re
from IPython.display import Audio, display
def display_audio(agent_response):
pattern = r"https://[^\s/$.?#].[^\s]*"
sound_file = re.findall(pattern, agent_response["data"]["output"])[0].replace(").", "").replace(")","")
print(sound_file)
response = requests.get(sound_file)
# Check if the request was successful
if response.status_code == 200:
# Open a file in binary write mode
with open('downloaded_file.mp3', 'wb') as file:
# Write the content of the response to the file
file.write(response.content)
display(Audio('downloaded_file.mp3', autoplay=True))
os.remove('downloaded_file.mp3')
display_audio(agent_response1)
Step 7: Use Agent Memory for Follow-up Queries
Your agent supports short-term memory, which allows it to keep track of context across multiple interactions. Here’s how to use the session_id
for follow-up questions:
session_id = agent_response1["data"]["session_id"]
print(f"Session id: {session_id}")
agent_response2 = agent.run("Extract the personal information about that driver.", session_id=session_id)
print("\nResponse:")
print(agent_response2)
agent_response2["data"]["output"]
You can continue asking questions in the same session to maintain context.
agent_response3 = agent.run("What about in 1991? I want the answer in German text this time", session_id=session_id)
print("\nResponse:")
print(agent_response3)
agent_response4 = agent.run("get me his personal info", session_id=session_id)
print("\nResponse:")
print(agent_response4)
response = agent.run("Who is the Brazilian athlete who most won Olympic medals for their country? Look for their personal info as well.")
print(response)
You've successfully built and interacted with a conversational agent using various tools and memory functionalities. Feel free to explore additional features and customize the agent to your needs!