Understanding AI Agents in the Context of GenAI

Read Time: 5 minutes
Understanding the Current Landscape
In the realm of AI, it's crucial to grasp its current state. Over the past year, we've witnessed a transition from lofty expectations to tangible advancements. What once seemed like magic has now materialized into three distinct capabilities: generative AI, reasoning, and human-like interaction.
Generative AI, in particular, has emerged as a game-changer, enabling the creation of diverse content—from images to text and even video. This capability, coupled with the ability to reason, marks a significant shift in how software interacts with the world, resembling human-like cognition.
Navigating the Path Forward
When we look back on the journey of AI, it's clear that we're on the cusp of a major turning point —one defined by intelligence augmentation and collaborative problem-solving.
Just as previous waves of technology transformed society, AI promises to revolutionize productivity and redefine human-machine interactions. Looking ahead, we anticipate a convergence of AI with other transformative technologies, paving the way for groundbreaking applications across various sectors. From healthcare to education, AI holds the key to addressing complex challenges and unlocking new possibilities.
As we explore AI further, the concept of AI agents or autonomous agents in AI becomes increasingly significant. These agents represent the next evolution in AI, where machines not only generate content but also perform tasks autonomously. But what exactly are AI agents?
What is an Agent in the Context of GenAI?
An autonomous expert agent is a computer system designed to operate independently and perform specific tasks within a domain of expertise with a high level of efficiency. These agents consist of a set of components that enable learning, reasoning, planning, decision-making, and action, while having access to both internal and external data and knowledge. An orchestrator facilitates the execution of tasks that can range from simple to complex and allows for collaboration among agents.
Planning is a particularly complex component to implement, especially when it requires co-construction with humans (known as mixed-initiative planning). Multi-agent systems have existed in AI since the 1990s, with a wide array of applications that will not be covered here. The concept of AI agents has been applied to generative AI (GenAI), integrating specific capabilities such as language models and reasoning chains.
These agents are designed to handle specific use cases that require generative AI by chaining various tasks to achieve a goal. This represents a specialization of the general notion of an expert AI agent, limited to the use of a GenAI model and the possible applications of generative AI.
An Expert Agent in Generative Artificial Intelligence (GenAI Agent) is a computer system built using generative AI technologies to function autonomously and accomplish tasks within a specific area of expertise with high performance. It possesses the following attributes and capabilities:
- Object Description: This allows other agents to identify and collaborate with it.
- Instructions: Contextual information, objectives, data sources to query, tasks to perform, examples to use, and controls to execute.
- Multimodal Generative Model: This model processes input instructions and data (text, images, tabular data, etc.), which can be provided by a human during a conversational interaction or generated by another system or AI agent.
- Output Generation: The generative model produces results that can be evaluated by a human in a co-pilot mode or fed into another computer system (like an expert AI agent) in an appropriate format, known as a protocol, without human supervision.
- Memory: A short-term memory (context window) and a long-term memory (persistent memory for personalizing future interactions), along with planning, decision-making, and reasoning capabilities (reflection, self-critique, reasoning chains, task decomposition).
- Access to Tools: Such as calendars, enterprise applications, or search engines, with the generative model deciding which tools to use and in what sequence to achieve its objectives.
- External Data Sources: In addition to the model's training data.
Note that an agent can exist without interaction with other agents or evolve within a group of agents. Within a team, agents will be assembled in a workflow built by the human or the LLM, who decides to "chain tasks" according to business logic. Autonomous agents can collaborate with other agents within the framework of a broad objective, requiring expertise in several domains, to transform into an IAGen multi-agent system, but the implementation requires a complex control and planning system to develop. The future evolution of the models will certainly open other potentialities, such as the proposal of Large Concept Models [3], for example, allowing greater contextualization and conceptualization.
The figure below shows an example of agent architecture which proposes to combine reasoning and action capabilities.
The power of AI Agents: Transformative workflows
In the context of AI, our interactions often follow a non-agentic workflow, akin to typing a problem and receiving a generated answer. But what if we shifted to an agentic approach, mirroring human collaboration and iteration? Imagine an AI-powered essay writing process: instead of demanding a flawless draft in one go, we prompt an AI to outline, research, draft, revise, and iterate. This iterative workflow, involving multiple AI agents with diverse roles and expertise, forms the backbone of agentic reasoning.
While some may argue that AI agents are merely large language models (LLMs), the crux lies in their collaborative potential. The two major common areas currently where an everyday user can benefit from an AI agent are essay generation and code snippet generation.
Picture a team of agents—writers, reviewers, spell checkers, fact checkers—each contributing their specialized skills to refine an essay through iterative cycles. This will lead to enhanced productivity and remarkably better results than traditional non-agentic methods.
Take, for instance, coding tasks. While zero-shot prompting may yield decent results, the true power lies in wrapping an agentic workflow around LLMs. By leveraging reflection—prompting an AI to evaluate and improve its own output—alongside tool use and multi-agent collaboration, we witness exponential performance gains.
Andrew Ng, renowned AI expert, highlights that GPT-3.5, when integrated into an agentic workflow, can outperform GPT-4 in specific scenarios, demonstrating the powerful impact of collaborative reasoning.
“I truly believe the future of artificial intelligence is going to be agentic”
- Dr Andrew Ng
Key Design Patterns of AI Agent
Let's delve deeper into the key design patterns underpinning agentic workflows:
-
Reflection: By prompting AI to assess and refine its output iteratively, we unlock superior performance and accuracy, akin to human self-correction and improvement.
-
Tool Use: Equipping AI with pre-existing tools and libraries empowers it to execute tasks efficiently and reliably, leveraging existing resources to streamline workflows.
-
Planning: Providing AI with the ability to strategize and plan steps enables it to tackle complex tasks methodically, mimicking human problem-solving approaches.
-
Multi-Agent Collaboration: Harnessing the collective intelligence of multiple AI agents, each specialized in distinct roles, fosters synergy and innovation, driving performance to new heights.
These design patterns not only enhance productivity but also pave the way for groundbreaking advancements in AI capabilities. As we embrace agentic reasoning, we anticipate a paradigm shift in how we approach AI applications. By leveraging the collaborative potential of AI agents, we embark on a journey towards unlocking the full spectrum of AI's potential, one iterative step at a time.
Why is AI Agent an exciting opportunity?
With AI agents handling most tasks, a single person can achieve the same level of output and progress as a large team. Sam Altman, CEO of OpenAI, was proposing the concept of the “1-person billion-dollar company.” Even though some LLMs today can accept very long input contexts (for instance, Gemini 1.5 Pro accepts 1 million tokens), their ability to truly understand long, complex inputs are mixed. An agentic workflow in which the LLM is prompted to focus on one thing at a time can give better performance. By telling it when it should play software engineer, we can also specify what is important in that subtask.
“[Intelligent] autonomous agents are the natural endpoint of automation in general. In principle, an agent could be used to automate any other process. Once these agents become highly sophisticated and reliable, it is easy to imagine an exponential growth in automation across fields and industries.”
- Bojan Tunguz, Machine Learning at NVIDIA
Companies need to start preparing today for agents’ arrival to the mainstream in three to five years with a robust transformation roadmap.
Conclusion: Embracing the AI Future
AI's impact goes far beyond the things it can already do. While we're seeing exciting applications now, the future holds even greater possibilities for how AI can transform our world, signaling a productivity revolution comparable to historical milestones. This revolution unfolds in stages, evolving from simple tools to sophisticated machine assistants, and ultimately to intricate machine networks. The economic impact will be profound, reducing costs and enhancing efficiency across various industries.
Our research motivates a strong belief that existing and upcoming models will be significantly "smarter," particularly in terms of planning capabilities. This capability is key for executing complex tasks and will significantly increase the number of tasks that AI can perform.
Let's embrace this journey and explore the boundless opportunities AI has to offer.
To learn more, book a call now!
Sources:
https://youtube.com/watch?v=ZYf9V2fSFwU&si=nE02ea-LlxJ37ZHL, https://youtube.com/watch?v=9ZhbA0FHZYc&si=4qkd6HRGTywIEF1F, https://youtube.com/watch?v=TDPqt7ONUCY&si=kPHjQNdXPSLmi5aJ, https://blog.google/technology/ai/google-gemini-next-generation-model-february-2024/, https://lilianweng.github.io/posts/2023-06-23-agent/, https://www.deeplearning.ai/short-courses/ai-agentic-design-patterns-with-autogen/, https://www.bcg.com/publications/2023/gpt-was-only-the-beginning-autonomous-agents-are-coming, https://www.mattprd.com/p/the-complete-beginners-guide-to-autonomous-agents