Build a multi-purpose text agent

In this tutorial, you will learn how to leverage a team of agents using aiXplain to analyze and write poems. We will create a multimedia agent to process audio, video, and images, and a text agent to generate poems based on provided specifications. In the end, we will combine these agents into a team to illustrate how they work together to perform complex tasks.

Overview of Agents

Before we start building, let's understand what agents are. Agents are entities that can understand user instructions and autonomously perform actions. With the help of Large Language Models (LLMs), agents can break down user queries into smaller tasks, select the appropriate tools, and provide a final response based on the tools' outputs.

In aiXplain, you can create agents by equipping them with various Models and Pipelines.

Step 1: Create a Multi-Media Agent

First, we will create a multimedia agent that can handle audio, video, and image processing. We'll start by building model tools for various tasks like speech recognition, speech synthesis, and Optical Character Recognition (OCR).

Create Model Tools for Multimedia Processing

from aixplain.factories import AgentFactory
from aixplain.modules.agent import ModelTool
from aixplain.enums import Function, Supplier

speech_regognition_tool = ModelTool(
    function=Function.SPEECH_RECOGNITION,
    supplier=Supplier.MICROSOFT
)

speech_synthesis_tool = ModelTool(
    function=Function.SPEECH_SYNTHESIS,
    supplier=Supplier.GOOGLE
)

ocr_tool = ModelTool(function=Function.OCR)

Create the Multimedia Agent

Now, let’s use these tools to create the multimedia agent.

from aixplain.factories import AgentFactory

multimedia_agent = AgentFactory.create(
    name="Multimedia Agent AVI",
    tools=[
        speech_regognition_tool,
        speech_synthesis_tool,
        ocr_tool
    ],
    description="Agent for Audio and Image Processing",
)
multimedia_agent.id

Step 2: Create a Text Agent

Next, we will create a text agent that can handle tasks like generating poems. We'll set up a text generation pipeline using a model specified by its asset_id.

Set Up the Text Generation Pipeline

from aixplain.factories.pipeline_factory import PipelineFactory

for pipeline in PipelineFactory.list(query="Poem Generator")["results"]:
  pipeline.delete()

pipeline = PipelineFactory.init('Poem Generator')
input_node = pipeline.input()
input_node.label = "PoemDescriptionInput"

text_generation_node = pipeline.text_generation(asset_id="669a63646eb56306647e1091")
text_generation_node.inputs.prompt.value = "Generate a poem according to the following specifications: "

input_node.link(text_generation_node, 'input', text_generation_node.inputs.text)

output_node = text_generation_node.use_output('data')

pipeline.save(save_as_asset=True)

Create the Text Agent

Now, we will create the text agent and assign it multiple tools, including translation, sentiment analysis, and the poem generator pipeline.

text_agent = AgentFactory.create(
    name="Textual Agent",
    tools=[
        AgentFactory.create_model_tool(
            function=Function.TRANSLATION,
            supplier=Supplier.MICROSOFT
        ),
        AgentFactory.create_model_tool(
            function=Function.SENTIMENT_ANALYSIS,
            supplier=Supplier.MICROSOFT
        ),
        AgentFactory.create_pipeline_tool(
            pipeline=pipeline.id,
            description="Poem Generator Tool"
        )
    ],
    description="Agent for Text Processing",
    llm_id="6646261c6eb563165658bbb1"
)
text_agent.id

Step 3: Create a Team Agent

Now that we have a multimedia agent and a text agent, let’s combine them into a team agent. This collaborative multi-agent system will allow each agent to specialize in specific tasks and work together to solve complex problems.

To learn more about team agents, please refer to this guide.

Create the Team Agent

from aixplain.factories import TeamAgentFactory

team = TeamAgentFactory.create(
    name="Team of Agents for Text Audio and Image Processing",
    agents=[
        multimedia_agent,
        text_agent
    ],
    llm_id="6646261c6eb563165658bbb1"
)
team.id

Step 4: Invoke the Team Agent

Let's try out the team agent with a poem by Pablo Neruda. Here, we will convert the poem into English audio.

finale_pablo_neruda = """Final by Pablo Neruda

Matilde, años o días
dormidos, afiebrados,
aquí o allá,
clavando
rompiendo el espinazo,
sangrando sangre verdadera,
despertando tal vez
o perdido, dormido:
camas clínicas, ventanas extranjeras,
vestidos blancos de las sigilosas,
la torpeza en los pies.

Luego estos viajes
y el mío mar de nuevo:
tu cabeza en la cabecera,

tus manos voladoras
en la luz, en mi luz,
sobre mi tierra.

Fue tan bello vivir
cuando vivías!

El mundo es más azul y más terrestre
de noche, cuando duermo
enorme, adentro de tus breves manos."""

response = team.run(
    query="Please convert the following poem in an English audio:\n{{poem}}",
    content={
        "poem": finale_pablo_neruda
    }
  )
print(response)

Show output

response["data"]

Show output

Parse and Play Audio Response

To play the audio response, use the code snippet below:

import requests
import re
from IPython.display import Audio, display

def display_audio(agent_response):
  pattern = r"https://[^\s/$.?#].[^\s]*"
  sound_file = re.findall(pattern, agent_response["data"]["output"])[0].replace(").", "").replace(")","")
  print(sound_file)
  response = requests.get(sound_file)

  # Check if the request was successful
  if response.status_code == 200:
      # Open a file in binary write mode
      with open('downloaded_file.mp3', 'wb') as file:
          # Write the content of the response to the file
          file.write(response.content)
  display(Audio('downloaded_file.mp3', autoplay=True))
  os.remove('downloaded_file.mp3')

display_audio(response)

Invoking Agent with Short-term Memory

Agents in aiXplain support short-term memory, allowing them to retain context across interactions. Here’s how you can use short-term memory for follow-up questions:

session_id = response["data"]["session_id"]
print(f"Session id: {session_id}")

response = team.run("Classify the sentiment of this poem please", session_id=session_id)
print(response)

Show output

response["data"]["output"]

Show output

response = team.run("What is the history of this poem?", session_id=session_id)
print(response)

Show output

response["data"]["output"]

Show output

Step 5: Generate a Poem

Now, we will use the team agent to generate a new poem in the style of Pablo Neruda.

response = team.run("Please generate a poem in the same style", session_id=session_id)
print(response)

Show output

response["data"]["output"]

Show output

Additional Example: Using an Image

The team agent can also process images to extract and analyze text. Here’s an example:

response = team.run(
  query="What is the history of the text in the figure:\nhttps://cdn.pensador.com/img/imagens/1d/as/1_das_pedras.jpg"
)
print(response)

Show output

response["data"]["output"]

Show output

You've now built and interacted with a multi-purpose text agent and multimedia agent in a collaborative team structure. Experiment further by customizing the agents or adding new functionalities!

Overview of Agents​

Step 1: Create a Multi-Media Agent​

Create Model Tools for Multimedia Processing​

Create the Multimedia Agent​

Step 2: Create a Text Agent​

Set Up the Text Generation Pipeline​

Create the Text Agent​

Step 3: Create a Team Agent​

Create the Team Agent​

Step 4: Invoke the Team Agent​

Parse and Play Audio Response​

Invoking Agent with Short-term Memory​

Step 5: Generate a Poem​

Additional Example: Using an Image​

Overview of Agents

Step 1: Create a Multi-Media Agent

Create Model Tools for Multimedia Processing

Create the Multimedia Agent

Step 2: Create a Text Agent

Set Up the Text Generation Pipeline

Create the Text Agent

Step 3: Create a Team Agent

Create the Team Agent

Step 4: Invoke the Team Agent

Parse and Play Audio Response

Invoking Agent with Short-term Memory

Step 5: Generate a Poem

Additional Example: Using an Image