Build a conversational agent

Agents are intelligent systems designed to perform specific tasks by integrating multiple model tools. In this tutorial, you'll learn how to build a conversational agent that combines capabilities like speech synthesis, translation, and named entity recognition (NER). These functionalities enable the agent to understand, process, and respond to user inputs across different modalities and languages. By the end of this tutorial, you will be able to create a robust agent capable of handling diverse queries and providing contextual answers, enhancing your understanding of how to utilize aiXplain’s tools effectively. Let's get started!

Step 1: Set Up Your API Key

First, you'll need to generate an API key and set your team API key from the Integrations page

import os
os.environ["AIXPLAIN_API_KEY"] = "<ACCESS_KEY>"

Step 2: Create Model Tools

Model tools are the core components that power your agent's abilities. In this tutorial, we’ll create tools for:

Speech Synthesis: Converts text into audio.
Translation: Translates text into various languages.
Named Entity Recognition (NER): Identifies and categorizes key information within the text.

These model tools will form the foundation for your agent's functionality.

from aixplain.factories import AgentFactory
from aixplain.modules.agent import ModelTool
from aixplain.enums import Function, Supplier

speech_synthesis_tool = ModelTool(
    function=Function.SPEECH_SYNTHESIS,
    supplier=Supplier.GOOGLE
)

translation_tool = ModelTool(
    function=Function.TRANSLATION,
)

ner_tool = ModelTool(
    function=Function.NAMED_ENTITY_RECOGNITION,
)

Step 3: Create a Pipeline Tool

Next, let's create a pipeline tool to chain multiple operations together, allowing your agent to handle more complex tasks. Here's how you can create a pipeline tool by referencing a pipeline's unique ID. You can find more information on how to build a pipeline by following this guide.

from aixplain.modules.agent import PipelineTool

pipeline_tool = PipelineTool(
    pipeline="PIPELINE ID",
    description="PIPELINE DESCRIPTION HERE"
)

Step 4: Build the Agent

Now that we've created our tools, it's time to build the agent using the AgentFactory class. In this tutorial, we'll create an agent called "Wiki Agent" designed to answer questions using Wikipedia as its knowledge base. For more information about agents, please refer to this guide.

from aixplain.factories import AgentFactory

agent = AgentFactory.create(
    name="Wiki Agent",
    description="An agent that uses Wikipedia to answer questions.",
    instructions="Answer user questions by leveraging Wikipedia as a knowledge source. Integrates speech synthesis, named entity recognition, translation, and a custom pipeline to understand and respond to queries with contextual relevance.",
    tools=[
        speech_synthesis_tool,
        ner_tool,
        translation_tool,
        pipeline_tool
    ],
    llm_id="6646261c6eb563165658bbb1"  # GPT 4o
)

agent.id

Step 5: Retrieve an Existing Agent

To interact with an already existing agent, you can retrieve its instance using its ID. Here's how to fetch the agent and explore its attributes:

agent = AgentFactory.get(agent.id)
agent.__dict__

Show output

Step 6: Invoke the Agent

Now it's time to put your agent to work! You can invoke the agent to perform various tasks, such as answering a question and providing an audio response.

agent_response1 = agent.run("What is the name of the driver who won Formula one championship in 2023? Answer in an English audio")
print(agent_response1)

Show output

agent_response1["data"]

Show output

To play the audio response, you can use the code snippet below:

import requests
import re
from IPython.display import Audio, display

def display_audio(agent_response):
  pattern = r"https://[^\s/$.?#].[^\s]*"
  sound_file = re.findall(pattern, agent_response["data"]["output"])[0].replace(").", "").replace(")","")
  print(sound_file)
  response = requests.get(sound_file)

  # Check if the request was successful
  if response.status_code == 200:
      # Open a file in binary write mode
      with open('downloaded_file.mp3', 'wb') as file:
          # Write the content of the response to the file
          file.write(response.content)
  display(Audio('downloaded_file.mp3', autoplay=True))
  os.remove('downloaded_file.mp3')

display_audio(agent_response1)

Step 7: Use Agent Memory for Follow-up Queries

Your agent supports short-term memory, which allows it to keep track of context across multiple interactions. Here’s how to use the session_id for follow-up questions:

session_id = agent_response1["data"]["session_id"]
print(f"Session id: {session_id}")

agent_response2 = agent.run("Extract the personal information about that driver.", session_id=session_id)
print("\nResponse:")
print(agent_response2)

Show output

agent_response2["data"]["output"]

Show output

You can continue asking questions in the same session to maintain context.

agent_response3 = agent.run("What about in 1991? I want the answer in German text this time", session_id=session_id)
print("\nResponse:")
print(agent_response3)

Show output

agent_response4 = agent.run("get me his personal info", session_id=session_id)
print("\nResponse:")
print(agent_response4)

Show output

response = agent.run("Who is the Brazilian athlete who most won Olympic medals for their country? Look for their personal info as well.")
print(response)

Show output

You've successfully built and interacted with a conversational agent using various tools and memory functionalities. Feel free to explore additional features and customize the agent to your needs!

Step 1: Set Up Your API Key​

Step 2: Create Model Tools​

Step 3: Create a Pipeline Tool​

Step 4: Build the Agent​

Step 5: Retrieve an Existing Agent​

Step 6: Invoke the Agent​

Step 7: Use Agent Memory for Follow-up Queries​

Step 1: Set Up Your API Key

Step 2: Create Model Tools

Step 3: Create a Pipeline Tool

Step 4: Build the Agent

Step 5: Retrieve an Existing Agent

Step 6: Invoke the Agent

Step 7: Use Agent Memory for Follow-up Queries