Agentic Design Patterns
A Visual Summary
A plain-language guide to the 21 design patterns that turn AI language models into intelligent, goal-driven systems — distilled from Antonio Gulli's 482-page handbook.
A note on freshness
This book was written in mid-2025 and published just as the agentic ecosystem began accelerating hard. I ran it through two independent reviews — one with Claude, one with ChatGPT — to identify what's held up and what hasn't. The short version:
The 21 patterns are durable. Prompt chaining, reflection, tool use, planning, memory, RAG (Retrieval-Augmented Generation), guardrails — these are the architectural grammar of agent development and they haven't been superseded. Treat this as a design handbook and architecture checklist, not a current cookbook. The book's "patterns first, frameworks second" mindset is exactly right.
MCP (Model Context Protocol) is now much bigger than one chapter suggests. It's been donated to the Linux Foundation, adopted by every major AI provider, and has 6,400+ registered servers. It's the default integration surface, not an emerging option. Multi-agent systems are real but overprescribed — current best practice from Anthropic and OpenAI says use them selectively, not reflexively. The framework code is the most dated layer: OpenAI's Assistants API (Application Programming Interface) is deprecated, ADK (Agent Development Kit) has moved to 2.0, and OpenClaw (250K GitHub stars in four months) introduced paradigms the book couldn't anticipate. The guardrails and evaluation chapters are more important now than when published — OpenClaw's security incidents proved that.
Conceptual value: 8.5/10. Current API freshness: 5/10. Strategic usefulness: 8/10.
Skip to 2026 Updates ↓What is an AI Agent?
An AI agent is a system that can perceive its environment, make decisions, and take actions to achieve a goal. Think of the difference between a calculator (you press buttons, it computes) and a personal assistant (you say "organize my schedule" and it figures out the steps on its own). An agent is the assistant.
At its core, every agent follows a five-step loop: get a mission, scan the scene, think it through, take action, then learn and improve. Large language models (LLMs) like ChatGPT or Gemini are the "brain" powering these agents, but on their own they just generate text. The 21 patterns in this book are the architecture that turns that raw brain into something that can reliably plan, use tools, remember things, collaborate with other agents, and stay safe.
2026 Update — Context Engineering
The book's emphasis on context engineering was ahead of the curve. Since mid-2025, Anthropic has explicitly described context engineering as the progression beyond prompt engineering: not just writing better prompts, but curating the full runtime context — system instructions, tools, MCP connections, message history, and external data. The "prompts are not enough" thesis has been fully vindicated.
Part 1Foundations
Seven core patterns that give agents the ability to break down work, make decisions, run tasks in parallel, self-correct, use external tools, plan ahead, and collaborate.
Prompt Chaining
Instead of asking an AI to do a massive, complicated thing all at once, you break it into a sequence of smaller steps. The output of step one becomes the input for step two, and so on — like an assembly line. This dramatically reduces errors because each step is simple and focused.
Real-world analogy: Imagine asking someone to "summarize this report, pull out the key numbers, and draft an email about it" all in one breath. They'd likely miss something. But if you ask them to do each piece separately, they nail every step.
Routing
Routing adds decision-making to an agent's workflow. Instead of always following the same path, the agent analyzes the input and chooses the right path. Think of a receptionist who listens to your question and directs you to the right department.
The router can be the AI itself (it reads the question and classifies it), a set of rules ("if the message mentions billing, go to the billing agent"), or even an embedding model that matches meaning rather than keywords.
Parallelization
When multiple tasks don't depend on each other, run them at the same time instead of one after another. A travel agent that checks flights, hotels, and restaurants simultaneously gives you an answer much faster than one that checks each sequentially.
Reflection
Reflection is an AI checking its own homework. After generating an initial answer, the agent evaluates it — looking for errors, gaps, or ways to improve — and then revises. This can be the same agent reviewing its own work, or a separate "critic" agent whose only job is finding flaws.
The Producer-Critic model is especially powerful: one agent writes the draft, another tears it apart with structured feedback, and the first agent uses that feedback to improve. This cycle can repeat until the output meets a quality threshold.
Tool Use (Function Calling)
An AI model's knowledge is frozen at its training date and it can't take real-world actions on its own. Tool Use gives it hands: the ability to call external services like search engines, calculators, databases, or APIs. The model decides which tool to use and what arguments to pass, then an orchestration layer executes the call and feeds the result back.
This is what lets an agent check today's weather, look up your order status, send an email, or run code — bridging the gap between "thinking" and "doing."
Planning
Planning is the ability to receive a high-level goal and autonomously decompose it into a sequence of steps before executing. A planning agent doesn't just react — it strategizes. Give it "organize a team offsite for 30 people" and it'll create a step-by-step plan (find venues, check availability, book flights, etc.) and then execute each step.
The key insight: use planning when the "how" needs to be discovered, not when it's already known. If the workflow is fixed and repeatable, a simple chain is better. Planning shines when the path to the goal depends on context.
Multi-Agent Collaboration
Instead of one agent doing everything, you build a team of specialists. A "Researcher" agent finds information, a "Writer" agent drafts content, an "Editor" agent reviews it. They communicate through defined channels, pass work between each other, and together solve problems none could handle alone.
Collaboration can be sequential (assembly line), parallel (divide and conquer), hierarchical (a manager delegates to workers), or debate-based (agents argue to reach consensus). The architecture mirrors how human teams work.
Six Collaboration Models
Single Agent — works alone. Network — peer-to-peer communication. Supervisor — one coordinator. Supervisor as Tool — coordinator provides resources. Hierarchical — multi-level management. Custom — hybrid, tailored to the problem.
2026 Update — Multi-Agent Selectivity
Current best practice from Anthropic and OpenAI is more conservative than the book suggests. Anthropic explicitly recommends finding the simplest solution possible and only increasing complexity when needed — multi-agent systems are expensive in tokens and poorly suited for tasks requiring tightly shared context. Most production systems in 2026 use single-agent architectures with good tool orchestration. Read this chapter with a strong "only when justified" filter.
Part 2State & Context
How agents remember things, learn from experience, connect to external systems, and track their own progress toward goals.
Memory Management
Without memory, an agent forgets everything between conversations — like a goldfish. Memory Management gives agents two types of recall:
Short-term memory is the conversation context — recent messages and tool results within the current chat. It's limited by the model's context window (how many words it can "see" at once).
Long-term memory stores information across sessions using external databases. When the agent needs to recall something from weeks ago, it queries this store and pulls the relevant information back into its current context.
Learning & Adaptation
This pattern is about agents getting better over time. Through techniques like reinforcement learning (trial and error with rewards), few-shot learning (learning from a handful of examples), or even self-modification (an agent editing its own code), agents can evolve beyond their initial programming.
A striking example is SICA (Self-Improving Coding Agent), which reviews its own past performance, identifies what to improve, then modifies its own source code to get better at coding tasks — progressively building new tools for itself.
Model Context Protocol (MCP)
MCP is like a universal adapter for AI. Instead of building a custom integration every time you want an AI to talk to a new tool or database, MCP provides a standardized interface. Any AI that speaks MCP can connect to any tool that speaks MCP — like how USB-C lets any device plug into any port.
An MCP server exposes tools, data (resources), and interaction templates (prompts). An MCP client (the AI app) discovers what's available and uses it. This is different from basic function calling, which is vendor-specific and one-to-one. MCP creates an interoperable ecosystem.
2026 Update — MCP Is Now Infrastructure
MCP has become the foundational integration layer of the agentic ecosystem — far bigger than this chapter conveys. Anthropic donated it to the Linux Foundation's AAIF in December 2025. It's been adopted by ChatGPT, Gemini, Copilot, VS Code, and Cursor. 6,400+ registered servers, 97M monthly SDK downloads, and a move to fully stateless architecture targeting June 2026. The spec's current version is 2025-11-25; SSE transport is now legacy in favor of Streamable HTTP or stdio for new integrations. Read this chapter for concepts; check modelcontextprotocol.io for current protocol details.
Goal Setting & Monitoring
This pattern gives agents a sense of purpose. You define specific, measurable objectives (think SMART — Specific, Measurable, Achievable, Relevant, Time-bound — goals), and the agent continuously monitors its own progress. If it drifts off course or encounters obstacles, the monitoring system detects the deviation and triggers adjustments.
This transforms an agent from a task executor into a goal-driven system — one that can assess "am I getting closer to the goal?" and self-correct when the answer is no.
Part 3Reliability
Patterns that make agents robust, safe, and grounded in reality — handling errors gracefully, keeping humans in the loop, and connecting to real-world knowledge.
Exception Handling & Recovery
Real-world agents will encounter failures — APIs go down, data gets corrupted, unexpected inputs arrive. This pattern builds resilience through three layers: detection (catching errors as they happen), handling (logging, retrying, falling back to alternatives), and recovery (rolling back to a stable state, replanning, or escalating to a human).
Human-in-the-Loop
Full autonomy isn't always safe or desirable. HITL (Human-in-the-Loop) deliberately integrates human judgment at critical points — approving plans before execution, reviewing high-stakes decisions, providing feedback that improves the agent over time, or stepping in when the AI encounters something beyond its abilities.
Think of it as a collaboration: the AI handles the heavy lifting of data processing and initial analysis, while the human provides judgment, ethics, and domain expertise. The goal isn't to slow things down but to catch what the AI might miss.
Knowledge Retrieval (RAG)
RAG (Retrieval-Augmented Generation) solves a fundamental limitation: AI models only know what they were trained on, and that knowledge is frozen. RAG connects the model to a live knowledge base. When you ask a question, the system first searches for relevant information, then feeds those results into the AI's prompt as context, so it can generate an answer grounded in real, up-to-date facts.
Agentic RAG goes further: instead of passively accepting whatever the search returns, an intelligent agent evaluates the results — checking if sources are current, resolving contradictions between documents, identifying knowledge gaps and running follow-up searches.
Key Concepts
Embeddings turn text into numbers that capture meaning. Vector databases store these numbers for fast semantic search. Chunking breaks large documents into searchable pieces. Together they let an agent find information by meaning, not just keywords.
Part 4Advanced Patterns
Inter-agent communication protocols, resource optimization, advanced reasoning, safety guardrails, evaluation, prioritization, and exploration.
Inter-Agent Communication (A2A)
Google's A2A (Agent-to-Agent) protocol is an open standard that lets agents built with completely different frameworks talk to each other. Each agent publishes an "Agent Card" — a digital identity file describing what it can do. Other agents discover these cards and delegate tasks accordingly.
If MCP is a universal adapter for tools, A2A is a universal translator for agents. Together they enable an ecosystem where specialized agents from different vendors can be composed into complex workflows.
2026 Update — A2A Shipped v1.0
A2A shipped a production-ready v1.0 and is now under the AAIF alongside MCP. The book's complementary framing is confirmed by both projects' documentation. However, Google's own ADK still labels A2A support as experimental. MCP is settled infrastructure; A2A is promising but not yet ubiquitous.
Resource-Aware Optimization
Not every question needs the most powerful (and expensive) AI model. This pattern dynamically selects the right model based on task complexity, budget, and latency requirements. A simple factual question goes to a fast, cheap model; a complex analysis goes to a powerful one.
A "Router Agent" classifies the difficulty, and a "Critique Agent" evaluates the quality of responses, creating a feedback loop that improves routing over time. Fallback mechanisms ensure the system keeps working even when a preferred model is unavailable.
Reasoning Techniques
These techniques make the AI's "thinking" visible and structured, dramatically improving accuracy on complex problems.
Chain of Thought (CoT) prompts the model to think step-by-step instead of jumping to an answer. Tree of Thoughts (ToT) explores multiple reasoning paths simultaneously, like branching possibilities. ReAct interleaves reasoning with action — the agent thinks, acts (uses a tool), observes the result, then thinks again.
The Scaling Inference Law reveals that giving a smaller model more "thinking time" can outperform a larger model that answers quickly. Quality comes from how much the model deliberates, not just how big it is.
2026 Update — Reasoning Faithfulness
Keep CoT, ToT, and self-correction as design ideas, but stop assuming that exposed reasoning is either faithful or visible. Current evidence from Anthropic shows chain-of-thought is often not a reliable window into a model's actual reasoning process, and major providers increasingly keep raw reasoning chains hidden. In practice, rely on reasoning summaries, tool-call verification, tests, and evals — not the raw chain itself.
Guardrails & Safety
As agents become more autonomous, guardrails become essential. They're implemented at every layer: input validation screens out malicious content and jailbreak attempts. Output filtering catches toxic or off-topic responses. Behavioral constraints in the system prompt define boundaries. Tool restrictions limit what actions the agent can take.
Engineering reliable agents also means applying proven software principles: modularity (small, specialized agents), observability (structured logging of every decision), least privilege (agents only get the permissions they need), and checkpoint/rollback (the ability to undo if things go wrong).
2026 Update — OpenClaw: The Real-World Stress Test
OpenClaw — the open-source autonomous agent that hit 250K+ GitHub stars in four months — became a live case study in everything this chapter warns about. A CVE severity-8.8 vulnerability enabled full gateway compromise, Cisco labeled it "insecure by default," and China restricted state agencies from using it. Nvidia responded with NemoClaw, wrapping it in security guardrails. If you read one chapter with fresh urgency, make it this one.
Evaluation & Monitoring
Traditional software tests don't work well for AI agents because their outputs are non-deterministic. This pattern establishes continuous measurement: tracking accuracy, latency, and token costs; analyzing the full sequence of steps (the "trajectory") the agent takes; and using techniques like "LLM-as-a-Judge" where one AI evaluates another's quality.
The chapter also introduces the concept of AI "Contracts" — formal agreements that precisely define what an agent is expected to deliver, enabling objective verification of success.
2026 Update — Evaluation Is Now Essential
This chapter was forward-looking at publication — it's now the part of agent engineering teams most underinvest in. Anthropic's current guidance emphasizes that evaluating an agent means evaluating the harness plus the model, not just text output. LangSmith and OpenAI both expose formal evaluation tooling. Deloitte reports only 11% of organizations have agentic solutions in production — the gap is almost always about governance and evaluation, not model capability.
Prioritization
When an agent faces competing tasks and limited resources, it needs to decide what to do first. Prioritization evaluates each task against criteria like urgency, importance, dependencies, and cost, then ranks them. The agent tackles the highest-priority items and dynamically re-prioritizes as conditions change.
Exploration & Discovery
Most patterns optimize known workflows. Exploration is about finding things you didn't know to look for. Agents designed for exploration proactively venture into unfamiliar territory — generating hypotheses, designing experiments, running searches on their own initiative, and uncovering "unknown unknowns."
Google's AI Co-Scientist exemplifies this: a multi-agent system where one agent generates hypotheses, another critiques them, a third ranks them through simulated debates, and a fourth evolves the best ideas. In lab validation, it independently re-discovered findings that took human researchers over a decade.
ReferenceAgent Complexity Levels
Not all agents are created equal. The book defines a spectrum from simple to sophisticated.
Level 0 — Core Reasoning Engine
A raw language model with no tools, memory, or environment interaction. It can only answer from what it was trained on.
Level 1 — Connected Problem-Solver
The model gains the ability to use tools — search engines, APIs, databases. It can gather real-time information across multiple steps.
Level 2 — Strategic Problem-Solver
The agent can plan multi-step strategies, engineer its own context (selecting what information to focus on), operate proactively and continuously, and improve through self-feedback.
Level 3 — Collaborative Multi-Agent System
Multiple specialized agents work as a team — a project manager coordinating a researcher, a designer, and a marketer. The whole is greater than the sum of its parts.
ReferenceFive Future Hypotheses
Where the book predicts agent development is heading.
1. The Generalist Agent
Agents evolve from narrow specialists into reliable generalists that manage complex, weeks-long projects with minimal oversight.
2. Deep Personalization
Agents become proactive partners that anticipate needs and discover goals you haven't fully articulated yet.
3. Physical Embodiment
AI agents break out of screens and operate in the physical world through robotics — fixing things, manufacturing, providing care.
4. The Agent Economy
Autonomous agents become economic participants — running businesses, negotiating deals, and managing supply chains at machine speed.
5. Metamorphic Multi-Agent Systems
Systems that reorganize themselves — spawning, duplicating, or removing agents as needed, optimizing their own structure to achieve a declared goal.
March 2026What's Changed Since Publication
The most significant deltas between the book's mid-2025 snapshot and where the ecosystem stands today.
MCP Is Now Infrastructure, Not a Pattern
The book introduces MCP as one pattern among 21. It's now the foundational layer of the entire agentic ecosystem. Anthropic donated MCP to the Linux Foundation in December 2025, where it's governed by the Agentic AI Foundation (AAIF) — co-founded by Anthropic, OpenAI, and Block, backed by AWS, Google, Microsoft, and Cloudflare. As of February 2026 there are 6,400+ registered MCP servers, 97 million monthly SDK (Software Development Kit) downloads, and adoption across ChatGPT, Gemini, Copilot, VS Code, and Cursor. The protocol's November 2025 spec added async operations, statelessness, and server identity. A move to fully stateless architecture is targeted for the June 2026 spec. Read the chapter for concepts; check modelcontextprotocol.io for current protocol details.
A2A Shipped v1.0, but Remains Early
Google's A2A protocol shipped a production-ready v1.0 and is now also under the AAIF alongside MCP. The book's framing — A2A handles agent-to-agent coordination while MCP handles tool/context access — is confirmed by both projects' official documentation. However, Google's own ADK still labels A2A support as experimental. MCP is settled infrastructure; A2A is promising but not yet ubiquitous. The two are complementary, not competing.
Multi-Agent: Use Selectively, Not Reflexively
The book presents multi-agent collaboration as a broadly attractive architecture. Current best practice is more conservative. Anthropic's 2025 guidance explicitly recommends finding the simplest solution possible and only increasing complexity when needed, noting that multi-agent systems are much more expensive in tokens and poorly suited for tasks requiring tightly shared context. OpenAI's Agents SDK frames multi-agent as a deliberate architectural choice between handoffs and "agents as tools," not the default. The pattern is real and important — but the chapter should be read with a strong "only when justified" filter. Most production systems today use single-agent architectures with good tool orchestration.
The Framework Landscape Has Shifted
The book's three framework canvases — LangChain/LangGraph, CrewAI, and Google ADK — are all still active, but the ecosystem has expanded and shifted. LangGraph now positions itself as a low-level orchestration runtime. CrewAI recommends a "Flow-first" production approach. ADK has moved to 2.0. OpenAI now has an official Agents SDK with first-class support for tools, sessions, guardrails, MCP, and human-in-the-loop — and its older Assistants API is deprecated, shutting down August 2026. Expect the book's code samples to need dependency pinning or refactoring even if the architecture still makes sense.
OpenClaw: The Real-World Stress Test
The book couldn't have predicted it, but OpenClaw — an open-source autonomous AI agent launched in late 2025 — became the most-starred software project on GitHub (250K+ stars) in under four months. Jensen Huang called it "probably the single most important release of software ever" at GTC (GPU Technology Conference) 2026. It bundles tool use, memory, planning, and human-in-the-loop into a locally-run agent accessible through messaging apps. It also became a live case study in everything Chapter 18 (Guardrails) warns about: a CVE (Common Vulnerabilities and Exposures) severity-8.8 vulnerability enabling full gateway compromise, prompt injection attacks, Cisco labeling it "insecure by default," and Chinese government restrictions. Nvidia responded with NemoClaw, wrapping OpenClaw in security and privacy guardrails. If you read one chapter with fresh urgency, make it Guardrails.
Evaluation Went from Optional to Essential
The book's evaluation chapter was forward-looking at publication. It's now the part of agent engineering that teams most underinvest in relative to its importance. Anthropic's current guidance emphasizes that when you evaluate an agent, you're evaluating the harness plus the model, not just the text output. LangSmith and OpenAI both now expose formal evaluation tooling. Deloitte reports that only 11% of organizations have agentic solutions in production — and the gap between demo and production is almost always about governance, testing, and evaluation, not model capability.
Context Engineering > Prompt Engineering
The book's emphasis on context engineering in Chapter 1 was ahead of the curve. Since mid-2025, Anthropic has explicitly described context engineering as the progression beyond prompt engineering: not just writing better prompts, but curating the full runtime context — system instructions, tools, MCP connections, message history, and external data. OpenAI's Agents SDK treats all of these as first-class primitives. The book's "prompts are not enough; the surrounding context matters" thesis has been fully vindicated.
March 2026Application: How People Are Building Agents Today
The patterns in this book are not theoretical — they are the architecture behind a wave of production agent systems. Here's how teams are actually putting them together.
The Dominant Architecture: Single Agent + Tools
Despite the excitement around multi-agent systems, the vast majority of production agents today are a single LLM instance with well-orchestrated tool access. The recipe: one model, a system prompt that defines its role and constraints, a set of tool definitions it can call (search, code execution, API calls, file operations), and a loop that lets it reason → act → observe → repeat. This is prompt chaining (Chapter 1), tool use (Chapter 5), and MCP (Chapter 10) working in concert. Multi-agent architectures cost 5–10× more in tokens and are harder to debug — so the practical advice from Anthropic, OpenAI, and the broader community is the same: start with one agent, document its limitations, and only add complexity when single-agent clearly isn't enough.
The SDK Landscape
Four major frameworks have emerged, each with a different philosophy. Anthropic's Claude Agent SDK (Python + TypeScript) exposes the same agent loop that powers Claude Code — its primitives are query() for the main loop, custom tools via a @tool decorator implemented as in-process MCP servers, hooks for intercepting agent events, and subagents for delegation. OpenAI's Agents SDK provides Agents (model + instructions + tools), Handoffs (agent-to-agent delegation as tool calls), Guardrails (input/output validation), and a Runner that orchestrates everything — with tracing enabled by default. Google ADK 2.0 uses an event-driven runtime with both LLM-powered agents and deterministic workflow agents (sequential, parallel, loop). LangGraph (v1.0 GA since October 2025) models agents as state graphs — nodes are logic, edges are routing, and checkpointing at every step enables human-in-the-loop and cross-session memory. CrewAI takes a team metaphor: agents have roles, goals, and backstories; tasks get assigned; and Flows coordinate event-driven orchestration across crews.
Coding Agents: The Highest-Profile Use Case
Coding agents are the most visible application of nearly every pattern in this book. Claude Code, Cursor, Windsurf, and Devin all use planning (Chapter 6) to break tasks into steps, tool use (Chapter 5) to read and write files, parallelization (Chapter 3) to work on multiple files simultaneously, and human-in-the-loop (Chapter 13) to checkpoint before destructive operations. In February 2026, every major coding tool shipped multi-agent capabilities within a two-week span — Grok Build with 8 specialized agents, Windsurf with 5 parallel agents, Claude Code with Agent Teams. The shared architecture: a Planner agent decomposes the task, specialized Implementer agents execute in parallel, and a Reviewer agent validates the output. Crucially, benchmarks show that the agent scaffolding matters as much as the underlying model — three frameworks running identical models scored 17 issues apart on 731 benchmark problems.
The Building Blocks Everyone Uses
Tool calling is the universal primitive — every framework represents tools as function definitions (name, description, parameters) that the model can invoke. MCP has become the standard connector layer, with 6,400+ registered servers and 97 million monthly SDK downloads. Guardrails operate at three levels: input validation (before the agent processes), output validation (before the response returns), and action-level (least-privilege permissions, idempotency keys). Human-in-the-loop follows a consistent pattern across frameworks: agent drafts or proposes, human approves, then execution proceeds — with autonomy expanding gradually for low-risk actions. Memory is layered: working memory (current conversation), compressed summaries, produced artifacts, and persistent long-term preferences. Observability — tracing every LLM call, tool invocation, and handoff — has 89% adoption and is considered table stakes for production.
Enterprise Reality Check
72% of Global 2000 companies now operate agent systems beyond experimental testing. The leading use cases by deployment share: customer service (27%), research and data analysis (24%), internal productivity (27% among organizations with 10,000+ employees), and code generation (86% of organizations report deploying agents for production code). But the gap between demo and production remains real. 32% of enterprises cite quality — accuracy, consistency, tone, policy adherence — as the primary blocker. 62% lack a clear starting point. Specialized agents with deep functional expertise (Salesforce shipped 6 domain-specific agents for its Agentforce Health product) consistently outperform generalist chatbots. The emerging consensus: governance, evaluation, and guardrails must be embedded from the start, not bolted on after a prototype impresses the leadership team.
The Practical Takeaway
If you're reading this book in 2026, the patterns are all real and in production — but the order matters. Start with a single agent that chains prompts and calls tools. Add memory when sessions need to persist. Add guardrails and human-in-the-loop before going to production. Add evaluation before scaling. Consider multi-agent only when you've hit a documented wall with single-agent. The book gives you all 21 patterns; the ecosystem has learned the hard way that applying them incrementally, not all at once, is what separates working systems from expensive demos.