Build a multi-purpose text agent
In this tutorial, you will learn how to leverage a team of agents using aiXplain to analyze and write poems. We will create a multimedia agent to process audio, video, and images, and a text agent to generate poems based on provided specifications. In the end, we will combine these agents into a team to illustrate how they work together to perform complex tasks.
Overview of Agents
Before we start building, let's understand what agents are. Agents are entities that can understand user instructions and autonomously perform actions. With the help of Large Language Models (LLMs), agents can break down user queries into smaller tasks, select the appropriate tools, and provide a final response based on the tools' outputs.
In aiXplain, you can create agents by equipping them with various Models and Pipelines.
Step 1: Create a Multi-Media Agent
First, we will create a multimedia agent that can handle audio, video, and image processing. We'll start by building model tools for various tasks like speech recognition, speech synthesis, and Optical Character Recognition (OCR).
Create Model Tools for Multimedia Processing
from aixplain.factories import AgentFactory
from aixplain.modules.agent import ModelTool
from aixplain.enums import Function, Supplier
speech_regognition_tool = ModelTool(
function=Function.SPEECH_RECOGNITION,
supplier=Supplier.MICROSOFT
)
speech_synthesis_tool = ModelTool(
function=Function.SPEECH_SYNTHESIS,
supplier=Supplier.GOOGLE
)
ocr_tool = ModelTool(function=Function.OCR)
Create the Multimedia Agent
Now, let’s use these tools to create the multimedia agent.
from aixplain.factories import AgentFactory
multimedia_agent = AgentFactory.create(
name="Multimedia Agent AVI",
tools=[
speech_regognition_tool,
speech_synthesis_tool,
ocr_tool
],
description="Agent for Audio and Image Processing",
)
multimedia_agent.id
Step 2: Create a Text Agent
Next, we will create a text agent that can handle tasks like generating poems. We'll set up a text generation pipeline using a model specified by its asset_id
.
Set Up the Text Generation Pipeline
from aixplain.factories.pipeline_factory import PipelineFactory
for pipeline in PipelineFactory.list(query="Poem Generator")["results"]:
pipeline.delete()
pipeline = PipelineFactory.init('Poem Generator')
input_node = pipeline.input()
input_node.label = "PoemDescriptionInput"
text_generation_node = pipeline.text_generation(asset_id="669a63646eb56306647e1091")
text_generation_node.inputs.prompt.value = "Generate a poem according to the following specifications: "
input_node.link(text_generation_node, 'input', text_generation_node.inputs.text)
output_node = text_generation_node.use_output('data')
pipeline.save(save_as_asset=True)
Create the Text Agent
Now, we will create the text agent and assign it multiple tools, including translation, sentiment analysis, and the poem generator pipeline.
text_agent = AgentFactory.create(
name="Textual Agent",
tools=[
AgentFactory.create_model_tool(
function=Function.TRANSLATION,
supplier=Supplier.MICROSOFT
),
AgentFactory.create_model_tool(
function=Function.SENTIMENT_ANALYSIS,
supplier=Supplier.MICROSOFT
),
AgentFactory.create_pipeline_tool(
pipeline=pipeline.id,
description="Poem Generator Tool"
)
],
description="Agent for Text Processing",
llm_id="6646261c6eb563165658bbb1"
)
text_agent.id
Step 3: Create a Team Agent
Now that we have a multimedia agent and a text agent, let’s combine them into a team agent. This collaborative multi-agent system will allow each agent to specialize in specific tasks and work together to solve complex problems.
To learn more about team agents, please refer to this guide.
Create the Team Agent
from aixplain.factories import TeamAgentFactory
team = TeamAgentFactory.create(
name="Team of Agents for Text Audio and Image Processing",
agents=[
multimedia_agent,
text_agent
],
llm_id="6646261c6eb563165658bbb1"
)
team.id
Step 4: Invoke the Team Agent
Let's try out the team agent with a poem by Pablo Neruda. Here, we will convert the poem into English audio.
finale_pablo_neruda = """Final by Pablo Neruda
Matilde, años o días
dormidos, afiebrados,
aquí o allá,
clavando
rompiendo el espinazo,
sangrando sangre verdadera,
despertando tal vez
o perdido, dormido:
camas clínicas, ventanas extranjeras,
vestidos blancos de las sigilosas,
la torpeza en los pies.
Luego estos viajes
y el mío mar de nuevo:
tu cabeza en la cabecera,
tus manos voladoras
en la luz, en mi luz,
sobre mi tierra.
Fue tan bello vivir
cuando vivías!
El mundo es más azul y más terrestre
de noche, cuando duermo
enorme, adentro de tus breves manos."""
response = team.run(
query="Please convert the following poem in an English audio:\n{{poem}}",
content={
"poem": finale_pablo_neruda
}
)
print(response)
response["data"]
Parse and Play Audio Response
To play the audio response, use the code snippet below:
import requests
import re
from IPython.display import Audio, display
def display_audio(agent_response):
pattern = r"https://[^\s/$.?#].[^\s]*"
sound_file = re.findall(pattern, agent_response["data"]["output"])[0].replace(").", "").replace(")","")
print(sound_file)
response = requests.get(sound_file)
# Check if the request was successful
if response.status_code == 200:
# Open a file in binary write mode
with open('downloaded_file.mp3', 'wb') as file:
# Write the content of the response to the file
file.write(response.content)
display(Audio('downloaded_file.mp3', autoplay=True))
os.remove('downloaded_file.mp3')
display_audio(response)
Invoking Agent with Short-term Memory
Agents in aiXplain support short-term memory, allowing them to retain context across interactions. Here’s how you can use short-term memory for follow-up questions:
session_id = response["data"]["session_id"]
print(f"Session id: {session_id}")
response = team.run("Classify the sentiment of this poem please", session_id=session_id)
print(response)
response["data"]["output"]
response = team.run("What is the history of this poem?", session_id=session_id)
print(response)
response["data"]["output"]
Step 5: Generate a Poem
Now, we will use the team agent to generate a new poem in the style of Pablo Neruda.
response = team.run("Please generate a poem in the same style", session_id=session_id)
print(response)
response["data"]["output"]
Additional Example: Using an Image
The team agent can also process images to extract and analyze text. Here’s an example:
response = team.run(
query="What is the history of the text in the figure:\nhttps://cdn.pensador.com/img/imagens/1d/as/1_das_pedras.jpg"
)
print(response)
response["data"]["output"]
You've now built and interacted with a multi-purpose text agent and multimedia agent in a collaborative team structure. Experiment further by customizing the agents or adding new functionalities!