Build a literature review agent
In this tutorial, we will build a Literature Review Agent by combining two specialised Agents: a literature review agent capable of analyzing research papers, extracting key insights, and synthesizing information for academic papers, and a discussion and future work agent to complement the literature review agent by focusing on extracting insights for the Discussion and Future Work sections. We’ll then integrate these into a Team Agent for enhanced performance.
Agents are intelligent entities designed to understand user instructions and autonomously perform actions. In aiXplain, agents are equipped with various Models and Pipelines, making them versatile and adaptable to different tasks. Team Agents in aiXplain can handle complex, multi-step tasks by collaborating with other agents. Learn more about Agents using this guide.
Step 1: Create the Literature Review Agent
Create the Search Tool
We will create a search tool using the search utility from the marketplace which the agent will use to find relevant literature.
from aixplain.modules.agent import ModelTool, PipelineTool
from aixplain.factories import AgentFactory
from aixplain.factories import TeamAgentFactory
search_tool = ModelTool(model="655e20f46eb563062a1aa301")
Create the Literature Review Agent
Now, we'll create the Literature Review Agent and equip it with the search tool and the Groq Gemma 7B model.
literature_review_agent = AgentFactory.create(
name="Literature Researcher",
description="A research expert that identifies key papers and compiles information for the Background and Related Work sections with the goal to Research and compile relevant literature from top AI conferences on Arxiv and other sources, focusing on methodologies and publications that influenced the current research",
tools = [search_tool],
llm_id="660ea9ba4edcc355738532c8"
)
Step 2: Create the Discussion and Future Work Agent
Similarly, we will create an agent to focus on the Discussion and Future Work sections.
discussion_future_work_agent = AgentFactory.create(
name="Discussion and Future Work Researcher",
description="An expert in extracting and synthesizing insights from other papers to generate strong Discussion and Future Work sections for our paper, and dicussing its contributions over the literature with the goal to Research discussion points and future directions from relevant high-impact papers to generate detailed Discussion and Future Work sections for given paper.",
tools = [search_tool],
llm_id="660ea9ba4edcc355738532c8"
)
Step 3: Combine Agents into a Team Agent
Now that we have both the Literature Review Agent and the Discussion and Future Work Agent, we'll combine them into a team agent.
from aixplain.factories import TeamAgentFactory
team = TeamAgentFactory.create(
name="Team for Research Review",
agents=[
literature_review_agent,
discussion_future_work_agent,
],
llm_id="6646261c6eb563165658bbb1"
)
Step 4: Invoke the Team Agent
Let's test our team agent by providing it with a research paper description and asking it to generate the Background, Related Work, Discussion, and Future Work sections.
agent_response = team.run(
"""
Research most relevant papers from NeurIPS, ICML, ICLR, ACL, EMNLP, COLING, AAAI and similar A* conferences on Arxiv. Summarize findings to create a detailed Background and Related Work section with references to recent and highly cited papers on the same topic that is comparing those papers with the current paper and explaining its superiority.
The expected output is Comprehensive Background and Related Work sections with references to relevant high-impact papers including a comparison of them with the current research and explaining why it is superior.
Study discussion and future work sections from high-impact AI papers (NeurIPS, ICML, ICLR, ACL, EMNLP, COLING, AAAI). Adapt them to the current project, ensuring that the Discussion section highlights key findings, and the Future Work section proposes relevant extensions or unanswered questions with the expected output being Detailed Discussion and Future Work sections highlighting key insights and potential research extensions.
Here is the description
Agentic AI systems comprise multiple specialized agents, each performing predefined tasks to achieve specific goals within complex workflows. Optimizing these systems for peak performance often requires continuous refinement of agent roles, tasks, and interactions. This paper presents a method for the autonomous refinement and optimization of Agentic AI solutions through iterative feedback loops. The proposed method employs a series of agents—Refinement, Execution, Evaluation, Modification, Comparison, and Documentation—to continuously improve individual agents' roles, tasks, and workflows by evaluating outputs against predefined qualitative and quantitative criteria. Through hypothesis generation, execution of modified configurations, and comparison of outputs, the system autonomously converges toward optimal performance without human intervention. This work provides a scalable solution for automating the improvement of AI-driven processes across multiple industries, enhancing efficiency and effectiveness.
end{abstract}
section{Introduction}
%Agentic AI systems, composed of multiple specialized agents working collaboratively to achieve complex objectives, have gained widespread adoption across industries such as market research, business process optimization, and product recommendation. These systems have unlocked new levels of efficiency by automating decision-making processes and streamlining intricate workflows. However, optimizing such systems for peak performance remains a significant challenge. Traditional approaches to agent configuration and workflow tuning often require manual intervention, which can be time-consuming, error-prone, and limited in scalability, especially in dynamic environments where objectives and conditions evolve rapidly.
%Recent advancements in large language models (LLMs) and machine learning have opened up new opportunities for overcoming these optimization challenges. LLMs, with their ability to generate and evaluate complex hypotheses, offer a promising solution to automating the refinement processes of Agentic AI systems. This paper introduces a novel method for autonomously optimizing Agentic AI systems using LLM-driven feedback loops, allowing for continuous refinement of agent roles, tasks, and workflows without human intervention.
%The proposed approach employs LLMs within an iterative feedback loop to hypothesize, execute, and evaluate changes to the system based on qualitative and quantitative performance metrics. Over multiple iterations, the proposed method dynamically refines agent behaviors and dependencies, resulting in improved efficiency, effectiveness, and scalability. This approach addresses the persistent challenge of optimizing Agentic AI systems in complex, real-world scenarios by leveraging LLMs to autonomously evaluate agent configurations and generate optimization hypotheses.
Agentic AI systems, consisting of multiple specialized agents working collaboratively to achieve complex objectives, have been increasingly adopted across various industries, including market research, business process optimization, and product recommendation. The automation of decision-making processes and the streamlining of intricate workflows have been made possible through these systems, offering improvements in efficiency and scalability. However, optimizing such systems for maximum performance continues to pose significant challenges. The manual tuning of agent roles, tasks, and interactions is often a time-consuming and error-prone process, with limitations in scalability, particularly in adapting to dynamic environments and evolving objectives. With the recent advancements in large language models (LLMs) and machine learning techniques, new opportunities for addressing these optimization challenges have been available. LLMs' capabilities to generate and evaluate complex hypotheses are being explored to automate the refinement processes of Agentic AI systems, enabling iterative improvements without human intervention. This advancement presents the potential for scalable and adaptive solutions that optimize agent behaviors and workflows.
This paper introduces a novel method for the autonomous refinement and optimization of Agentic AI systems. The proposed approach employs LLMs within an iterative feedback-improvement loop, where changes to agent configurations are hypothesized, executed, and evaluated based on qualitative and quantitative performance metrics. Over multiple iterations, agent roles, tasks, and dependencies are refined, improving efficiency, effectiveness, and scalability. The persistent challenge of achieving peak performance in dynamic and complex environments is addressed by integrating LLM-driven optimization processes. The proposed method's architecture and capabilities are examined in detail, illustrating how autonomous optimization can unlock new performance levels across various industrial applications. Extensive evaluations have been conducted to demonstrate the improvement of Agentic AI systems' performance. The results highlight significant gains in efficiency and overall system performance through the iterative refinement process. This empirical validation underscores the method's capability to optimize Agentic AI workflows autonomously in real-world applications.
This work establishes a scalable, autonomous framework for optimizing Agentic AI systems with minimal human oversight by addressing their challenges, and makes several key contributions:
begin{itemize}
item textbf{Autonomous Refinement Process}: A fully automated method for continuously optimizing Agentic AI systems using iterative feedback loops.
item textbf{Multi-Agent Optimization Framework}: A framework of specialized agents for refining system configurations and workflows without manual input.
item textbf{LLM Integration for Adaptive Improvement}: Using LLMs to autonomously evaluate and generate hypotheses for improving agent roles and tasks.
item textbf{Empirical Validation}: Demonstrated Agentic AI optimization performance improvements through iterative refinement in real-world applications.
end{itemize}
%This paper contributes a scalable, LLM-driven autonomous refinement method and sets a new standard for how Agentic AI systems can autonomously optimize their workflows across various domains, reducing human oversight and enabling continuous system improvements.
section{Background}
Agentic AI systems have been widely recognized for transforming workflow optimization by enabling autonomous decision-making processes across various industries. Significant efficiency gains have been achieved through their ability to automate tasks. However, optimizing these systems for peak performance remains challenging due to the complexity of agent interactions and the need for continuous refinement.
The MLAgentBench framework was introduced by citet{liu2024mlagentbench} to evaluate language agents across various tasks. While valuable insights have been provided into agents' autonomous improvement capabilities, the focus has remained on performance evaluation rather than workflow refinement, which is the primary focus of this study.
In Large Model Agents: State-of-the-Art, Cooperation, the potential of large model agents (LMAs) to enhance collaboration between agents using cutting-edge LLMs was explored by citet{smith2023largemodelagents}. This work closely aligns with the current study, as the role of LLMs in facilitating cooperation and iterative feedback loops was highlighted, which is central to the autonomous refinement process described here. Similarly, citet{johnson2023professionalagents} demonstrated how LLMs enable agents to autonomously refine their roles and workflows, underscoring the importance of LLM integration into Agentic AI systems.
The importance of modular components and foundation models in planning and execution was emphasized in Automated Design of Agentic Systems citep{hu2024automated}. This research primarily addressed system design, while the current study focuses on continuously refining agents' roles and workflows. In addition, automated evaluators were proposed to refine agent performance in predefined tasks, such as web navigation cite{pan2024autonomousevaluation}. Yet, their work has been limited in scope to specific domains, lacking the scalability and domain independence emphasized in this research.
In Agentic Skill Discovery citep{skilldiscovery2024agentic}, a novel LLM-based skill discovery framework was introduced, aligning with the method of iterative task proposals described here. The use of LLMs in generating and refining agent configurations closely mirrors the approach in which LLMs autonomously generate hypotheses to improve agent performance. Additionally, A Survey on LLM-Based Agentic Workflows and Components has facilitated the understanding of how LLM-profiled components enhance workflow refinement.
In comparison, this research presents a comprehensive framework for autonomously optimizing Agentic AI systems. A continuous feedback loop involving hypothesis generation, evaluation, and modification of agent roles and workflows is proposed, with no human intervention required. Unlike previous approaches, which have relied on predefined tasks or manual input, the system described here dynamically adapts to evolving objectives, improving scalability and adaptability across industries. Integrating LLMs for evaluation and hypothesis generation further enhances the system’s capacity for autonomous workflow optimization, ensuring adaptability to changing environments and objectives.
In conclusion, while existing research has significantly contributed to the development of autonomous agent systems, a novel, fully autonomous refinement framework is introduced in this study. By addressing scalability, flexibility, and domain independence, a superior solution for optimizing complex AI-driven workflows is offered, setting this method apart from previous models, which have focused on predefined tasks or lacked comprehensive refinement mechanisms.
section{System Overview}
The proposed method for autonomous refinement and optimization of Agentic AI systems leverages several specialized agents, each responsible for a specific phase in the refinement process. This method operates in iterative cycles, continuously refining agent roles, goals, tasks, workflows, and dependencies based on qualitative and quantitative output evaluation. The method’s optimization process is guided by two core frameworks: the Synthesis Framework and the Evaluation Framework. Synthesis Framework framework generates hypotheses based on the system’s output. The Hypothesis Agent and Modification Agent collaborate to synthesize new configurations for the Agentic AI system, proposing modifications to agent roles, goals, and tasks; to be tested by the Evaluation Framework.
The refinement and optimization process is structured into these frameworks, contributing to the continuous improvement of the Agentic AI solution. The proposed method operates autonomously, iterating through cycles of hypothesis generation, execution, evaluation, and modification until optimal performance is achieved. This method begins by deploying a baseline version of the Agentic AI system. Agents are assigned predefined roles, tasks, and workflows, and the system generates initial qualitative and quantitative criteria based on the system’s objectives. An LLM is used to analyze the system’s code and extract evaluation metrics, which serve as benchmarks for assessing future outputs. Human input can be introduced to revise or fine-tune the evaluation criteria to better align with project goals. However, this step is optional, as the method is designed to operate autonomously.
The proposed method begins with a baseline version of the Agentic AI system, assigning initial agent roles, goals, and workflows. The first execution is run to generate the initial output and establish the baseline for comparison. After evaluating the initial output, the Hypothesis Agent generates hypotheses for modifying agent roles, tasks, or workflows based on the evaluation feedback. These hypotheses are then passed to the Modification Agent, which synthesizes changes to agent logic, interactions, or dependencies, producing new system variants. The newly modified versions of the system are executed by the Execution Agent, and performance metrics are gathered. The outputs generated are evaluated using qualitative and quantitative criteria (e.g., clarity, relevance, execution time). The Comparison Agent compares the newly generated outputs against the best-known variant, ranks the variants, and determines whether the new output is superior. Memory Module stores The best-performing variants for future iterations. The cycle repeats as the proposed method continues refining the agentic workflows, improving overall performance until predefined criteria are satisfied.
subsection{Synthesis Framework}
The Refinement Agent manages the iterative optimization process by delegating tasks to other agents and synthesizing hypotheses for improving the system. It evaluates agent outputs against qualitative and quantitative criteria, identifying areas where agent roles, tasks, or workflows can be improved. The Refinement Agent leverages evaluation metrics such as clarity, relevance, depth of analysis, and actionability to propose modifications that enhance system output. The Hypothesis Generation Agent proposes specific changes to the agent system based on the output analysis. Based on evaluation feedback, this module generates hypotheses for improving agent roles, tasks, and interactions. For example, if agents are underperforming due to inefficiencies in their task delegation, the hypothesis module might suggest altering task hierarchies or reassigning specific roles.
The Modification Agent implements changes based on the hypotheses generated by the Refinement Agent. These changes may involve adjusting agent logic, modifying workflows, or altering agent dependencies. By synthesizing these changes, our method creates multiple variants of the Agentic AI solution. Each variant is stored and documented, with details regarding the expected improvements. The Execution Agent runs modified versions of the system, executing the newly generated variants and collecting performance data for subsequent evaluation. It ensures that agents perform their tasks as specified in the new configuration and debug issues as they arise. The Execution Agent tracks qualitative and quantitative outputs, feeding this information into the evaluation process.
subsection{Evaluation Framework}
Evaluation Framework is responsible for assessing the outputs of each system variant. The Evaluation Agent employs the LLM to evaluate both qualitative and quantitative aspects of the system’s performance. The Evaluation Framework ensures that each iteration aligns with the system’s overarching objectives, focusing on continuous improvement. The Evaluation Agent assesses the outputs of each system variant using a Large Language Model (LLM). The LLM evaluates outputs based on predefined qualitative criteria, including clarity, relevance to the task, depth of analysis, actionability, and quantitative metrics such as execution time and success rate. The Evaluation Agent provides a comprehensive system performance analysis, identifying areas for further improvement. After each iteration, the Comparison Agent compares the outputs of the modified system against the best-known configuration. It ranks the new variants based on the evaluation scores provided by the Evaluation Agent, determining which configuration yields the highest performance. The top-ranked variant is stored for future iterations, ensuring continuous improvement.
section{Methodology}
Agentic AI refinement process begins with the initialization of the best-known code variant, denoted as ( C_0 ), and the generation of its corresponding output, ( O_{C_0} ). The performance of the output is evaluated using a set of qualitative criteria (e.g., clarity, relevance, depth of analysis), where the evaluation function ( f(O_C, text{criteria}) ) produces a score ( S(C_0) = f(O_{C_0}, text{criteria}) ) based on these criteria. This initial score, ( S(C_0) ), is the baseline for comparison in subsequent iterations. At each iteration ( i ), the current best-known output, ( O_{C_i} ), is evaluated, and a set of hypotheses, ( mathcal{H}_i = text{generate_hypotheses}(E_{C_i}) ), is generated from the qualitative evaluation ( E_{C_i} ) to suggest improvements. The hypotheses ( mathcal{H}_i ) are then applied to the code ( C_i ), resulting in a new variant ( C_{i+1} = M(mathcal{H}_i, C_i) ). The new code variant ( C_{i+1} ) is executed, producing a new output ( O_{C_{i+1}} ). The new output is evaluated using the same evaluation function ( f(O_C, text{criteria}) ), yielding a new score ( S_{i+1} = f(O_{C_{i+1}}, text{criteria}) ). If the new score ( S_{i+1} ) is greater than the best-known score ( S_{text{best}} = max(S_{i+1}, S_{text{best}}) ), the new variant is considered superior, and the best-known variant is updated as follows. The process continues iteratively until a stopping condition is met, either when the improvement between iterations becomes smaller than a predefined threshold ( |S_{i+1} - S_{text{best}}| < epsilon ), or when a maximum number of iterations is reached. Upon termination, the proposed method returns the best-known variant ( C_{text{best}} ) and its output ( O_{text{best}} ).
Once initialized, the proposed method enters the execution phase, where agents perform their assigned tasks according to the baseline configuration. The Execution Agent runs the system, producing initial outputs that serve as a baseline for comparison in subsequent iterations. The results of this execution phase are stored for future analysis and comparison. The Evaluation Agent evaluates the outputs produced in the execution phase. The proposed method employs qualitative and quantitative criteria to assess the quality of the outputs. Qualitative metrics include relevance, clarity, depth of analysis, and actionability, while quantitative metrics include execution time, task completion rate, and overall system efficiency. The Evaluation Agent uses an LLM to generate detailed feedback, identifying areas where the system can be improved. The Hypothesis Generation Agent analyzes the evaluation data, generating hypotheses for improving agent roles, tasks, and workflows. These hypotheses may include changes such as altering task delegation, modifying agent goals, or restructuring the interdependencies between agents. Once the hypotheses are generated, the Modification Agent implements the proposed changes, creating new system variants based on these modifications. The modified versions of the system are re-executed by the Execution Agent, and the Evaluation Agent again evaluates their outputs. This iterative process continues, with each new variant compared against the previous best-known configuration. The Comparison Agent ranks the system variants based on performance, ensuring that the Memory Module only stores the top-performing versions.
section{Experiments}
In "AI Agents That Matter," benchmarks for agent development are posited, guiding the framework’s reliance on qualitative and quantitative evaluation metrics to ensure robust agent performance. The alignment of this work with seminal studies is demonstrated, highlighting both the advantages and challenges of autonomous systems. Significant improvements in efficiency and scalability are achieved by minimizing human interventions, while concerns related to LLM latency and real-time operation constraints are addressed. Flexibility and adaptability are ensured in our method, allowing it to remain responsive to these challenges and fostering robust and resilient agent configurations.
Experiments were executed on a GPU cluster with NVIDIA Tesla V100 GPUs, 32 GB memory per GPU, and 512 GB RAM. Each iterative optimization cycle took approximately 2.5 hours per system configuration. The total compute required to refine an Agentic AI system was approximately 150 hours of GPU time, accounting for initial experimentation and final performance evaluations.
section{Discussion}
The exploration of Agentic AI systems has been significantly aligned with the ongoing advancements in the field, particularly in adopting decentralized multi-agent frameworks. The proposed method is embraced by this decentralized approach, incorporating autonomous iterative feedback loops for continuous refinement and optimization. This mirrors the conceptual framework in systems managing complex environments like particle accelerators, where intelligent agents are tasked with high-level control tasks. Inspiration for the proposed method's Refinement and Evaluation frameworks was drawn from a notable study titled "Towards Agentic AI on Particle Accelerators" [2409.06336], which outlines a self-improving system of intelligent agents for control tasks, emphasizing autonomous modification without human input. A critical insight emerging from that work has been the importance of self-improving agents capable of autonomously adjusting their roles and interactions based on feedback loops. This underscores the potential for the proposed method to dynamically adapt and continuously enhance performance without human oversight, setting a precedence for future applications where evolving objectives and environmental changes are prevalent. Furthermore, LLMs are leveraged as high-level coordinators within multi-agent systems, providing a viable pathway for improved decision-making through sophisticated language and reasoning capabilities, which have also been integrated into the core operations of the proposed method.
section{Considerations}
The proposed method demonstrates significant advancements in autonomously optimizing Agentic AI workflows. However, several limitations exist. First, scalability remains challenging, particularly in large-scale systems with many agents, as iterative optimization may introduce significant computational overhead. Furthermore, the method's reliance on LLM-driven feedback loops increases the processing time required for evaluation and refinement, making real-time operation challenging. Additionally, the current framework assumes the availability of large-scale computational resources, which may not be practical for all applications. Finally, while LLMs enable sophisticated hypothesis generation, they also introduce latency, when processing large datasets or complex tasks.
The proposed method also has several societal impacts. On the positive side, the ability to autonomously optimize workflows without human intervention offers significant improvements in efficiency and scalability, potentially enabling businesses to handle more complex tasks with fewer resources. However, there are also potential negative consequences. For example, automating specific decision-making processes could lead to job displacement in industries that rely heavily on human expertise for workflow management. Future work should investigate methods to ensure fairness, mitigate potential biases, and address the societal implications of deploying such systems at scale.
section{Conclusion}
This paper presents a robust method for the autonomous refinement and optimization of Agentic AI solutions. The presented method continuously improves agent-based workflows by leveraging iterative feedback loops, hypothesis generation, and automated modifications, enhancing efficiency and effectiveness. The method's autonomous nature minimizes human intervention, making it ideal for large-scale applications that require ongoing refinement. The proposed method's scalability, flexibility, and ability to adapt to evolving objectives make it a powerful tool for optimizing complex AI-driven processes. While this method demonstrates promising advancements in Agentic AI, several avenues for future exploration could further enhance its capabilities. Investigating the role of human-in-the-loop strategies can bridge fully autonomous operations and scenarios where nuanced human judgment may be beneficial, especially during the initial deployment or in environments with high uncertainty. This could lead to hybrid systems where human expertise augments autonomous agent decision-making, ensuring safety and reliability without compromising autonomy. Finally, expanding the method to handle broader, more diverse tasks would validate adaptability across different sectors.
"""
)
agent_response['data']['output']
We successfully built a Literature Review Agent to analyze research papers and extract key insights efficiently. By combining agents for literature review and future research discussions into a team agent, we demonstrated how to streamline academic research tasks.