Multi-Agent Systems¶
While a single agent can handle many tasks, complex real-world problems benefit from multiple specialized agents working together. Multi-agent systems assign different roles, capabilities, and communication protocols to different agents, enabling collaboration no single agent could achieve alone.
Why Multi-Agent?¶
| Scenario | Single Agent | Multi-Agent |
|---|---|---|
| Research + write report | Overloaded, quality drops | Researcher + Writer |
| Code review | Limited perspective | Coder + Reviewer + Security analyst |
| Customer service | One-size-fits-all | Triage + Product + Escalation |
| Simulate a society | Impossible | Each person = one agent |
Key benefits: specialization, parallelism, division of labor, and realism.
1. OpenAI Swarm¶
Type: Lightweight multi-agent orchestration framework
Repo: openai/swarm
Released: October 2024
Focus: Handoffs between agents with minimal overhead
OpenAI Swarm is an experimental framework for multi-agent orchestration — managing handoffs and context transfer between agents, rather than relying on fixed pipelines.
Core Concepts¶
Agent — A unit with instructions (system prompt) and tools:
from swarm import Swarm, Agent
client = Swarm()
sales_agent = Agent(
name="Sales Agent",
instructions="You are a friendly sales assistant. Be concise and helpful.",
tools=[lookup_product, check_inventory]
)
support_agent = Agent(
name="Support Agent",
instructions="You are a technical support specialist. Be thorough and accurate.",
tools=[diagnose_issue, escalate_ticket]
)
Handoff — Transfer conversation to another agent with updated context:
def transfer_to_sales():
"""Handoff to sales agent"""
return sales_agent
def transfer_to_support():
"""Handoff to support agent"""
return support_agent
sales_agent.tools.append(transfer_to_support)
support_agent.tools.append(transfer_to_sales)
Key insight: Swarm uses two primitive operations — handoff and function calling. No central coordinator. The LLM decides when to hand off based on function return values.
Full Example: Customer Service Pipeline¶
from swarm import Swarm, Agent
client = Swarm()
# Triage agent routes to the right specialist
triage_agent = Agent(
name="Triage Agent",
instructions="""You are a customer service triage agent.
Route users to the appropriate agent:
- Purchasing, billing, product info → transfer_to_sales
- Technical issues, bugs, errors → transfer_to_support
""",
tools=[]
)
sales_agent = Agent(
name="Sales Agent",
instructions="You help with purchases, billing, and product information.",
tools=[lookup_product, process_order]
)
support_agent = Agent(
name="Support Agent",
instructions="You help with technical issues. Be thorough.",
tools=[diagnose_issue, create_ticket]
)
# Add handoff tools
triage_agent.tools.extend([
lambda: sales_agent,
lambda: support_agent,
])
# Run
response = client.run(
agent=triage_agent,
messages=[{"role": "user", "content": "My robot arm is making weird noises"}]
)
print(response.messages[-1]["content"])
Design Philosophy¶
OpenAI deliberately avoids "intelligent orchestration" — letting the LLM decide the flow would make debugging a "black box." Instead, Swarm exposes the low-level primitives (handoff + function calling) so developers can see and intervene at every step.
"The difference between Swarm and writing if-else yourself is that in three years, you'll want to know why you didn't write it like Swarm." — GitHub comment
Strengths and Limitations¶
| ✅ Strengths | ❌ Limitations |
|---|---|
| Extremely lightweight (~800 lines) | No built-in persistence |
| Agent-as-tool flexibility | No native multi-agent memory sharing |
| Easy to understand and prototype | Experimental (not production-ready) |
| Native OpenAI API integration | Limited error handling and recovery |
2. Microsoft AutoGen → Microsoft Agent Framework (MAF)¶
Type: Multi-agent conversational framework
Papers: AutoGen (2023) | MAF (2025)
Evolution: AutoGen v0.4 (2025) → Merged with Semantic Kernel → Microsoft Agent Framework (MAF) (2025 Oct)
AutoGen pioneered the idea that agents are conversation participants. By 2025, it had evolved into Microsoft Agent Framework (MAF), combining AutoGen's multi-agent patterns with Semantic Kernel's enterprise-grade reliability.
Core Concept: ConversableAgent¶
from autogen import ConversableAgent
assistant = ConversableAgent(
name="assistant",
system_message="You are a helpful Python coding assistant.",
llm_config={"model": "gpt-4o"},
)
user_proxy = ConversableAgent(
name="user",
human_input_mode="NEVER", # NEVER / ALWAYS / TERMINATE
max_consecutive_auto_reply=10,
code_execution_config={"work_dir": "coding", "use_docker": False},
)
user_proxy.initiate_chat(
assistant,
message="Write a Python function to compute matrix inverse.",
)
Group Chat: Multiple Agents Discussing¶
from autogen import GroupChat, GroupChatManager
group_chat = GroupChat(
agents=[user_proxy, researcher, critic, writer],
messages=[],
max_round=12,
)
manager = GroupChatManager(
groupchat=group_chat,
speaker_selection_method="auto", # or "round_robin"
)
user_proxy.initiate_chat(
manager,
message="Write an 800-word article about SMR nuclear progress in 2026.",
)
MAF: Production-Grade Multi-Agent¶
# Microsoft Agent Framework (2025+)
from agent_framework import GroupChat, GroupChatManager, AssistantAgent
# Multi-agent group chat (AutoGen style, with persistence)
group = GroupChat(
agents=[user_proxy, researcher, critic, writer],
max_rounds=15,
# New in MAF: persistent session id, checkpointing
)
manager = GroupChatManager(group=group)
await user_proxy.initiate_chat(
manager,
message="Research & write 600-word post on AI agent developments in 2026"
)
MAF adds deterministic DAG-based workflows alongside group chat:
# DAG workflow for order processing (deterministic, not emergent)
# Nodes: Agent | Function | Condition | Loop
# Edges: define deterministic execution paths
AutoGen vs. MAF vs. Swarm¶
| Feature | AutoGen | MAF | Swarm |
|---|---|---|---|
| Communication | Conversational (message passing) | Conversational + DAG | Handoff-based |
| Code execution | Native (code_execution_config) |
Native | Via tools |
| Group chat | Built-in GroupChat |
Built-in + persistence | Manual |
| Production readiness | v0.4 mature | RC stage (2025) | Experimental |
| Enterprise features | Limited | Built-in (checkpointing, OpenTelemetry) | None |
| Language | Python | Python + .NET | Python |
3. Stanford Generative Agents (Smallville)¶
Type: Simulation / research framework
Paper: Generative Agents (2023)
Demo: Stanford Smallville
Startup: Simile — raised $100M (Index Ventures, Andreessen, Lee, Karpathy)
This is a research prototype demonstrating how believable human behavior emerges from LLM-powered agents without explicit programming.
The Insight¶
Give each "person" in a virtual world: 1. A name, occupation, and personality 2. Memory streams (accumulated experiences) 3. The ability to reflect and plan
→ Agents spontaneously form relationships, coordinate activities, and exhibit emergent social behavior.
Architecture: Memory-Stream, Reflection, Planning¶
┌─────────────────────────────────────────────────────────────┐
│ Memory-Stream → Reflection → Planning │
├─────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ Memory Stream (chronological) │ │
│ │ [observation] → [observation] → [reflection] → ... │ │
│ │ "Isabella is at the coffee shop" │ │
│ │ "Tom invited Isabella to a party" │ │
│ └─────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ Reflection: synthesize observations into insights │ │
│ │ "Isabella seems like someone who enjoys organizing │ │
│ │ social events and bringing people together" │ │
│ └─────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ Planning: daily plan based on current state │ │
│ │ 08:00 - Wake up and have breakfast │ │
│ │ 09:00 - Open the coffee shop │ │
│ │ 13:00 - Have lunch at Hobbs Cafe │ │
│ └─────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘
Key Mechanisms¶
1. Perception — Agents observe the world and other agents:
2. Memory Retrieval — Retrieve relevant memories given current situation:
def retrieve_relevant(memory_stream, current_situation, k=5):
importance = score_relevance(memory.content, current_situation)
recency = score_recency(memory.timestamp)
relevance = alpha * importance + beta * recency
return top_k(memories, by=relevance, k=k)
3. Reflection — Periodically synthesize observations into high-level insights:
# Observations: "person is tired", "person is coughing", ...
# Reflection: "person might be getting sick"
4. Planning — Create a daily plan from current state and goals.
The Valentine's Day Party (Emergent Behavior)¶
One of the most famous demonstrations:
- Seed: Isabella gets the instruction "You want to throw a Valentine's Day party."
- Propagation: She spreads the word organically — other agents decide whether to attend based on their personalities.
- Coordination: Some agents offer to help decorate; others discuss what to wear.
- Result: 5 agents attend the party at the exact planned time.
No hard-coded rules. Everything emerged from the agents' memory, reflection, and planning.
Simile: From 25 to 1,000+ Agents¶
The original authors (Joon Sung Park et al.) founded Simile in 2026, scaling from 25 to 1,000+ agents simulating real human populations. Used for predicting customer behavior, brand messaging impact, and policy decisions. Wealthfront reported 15x expansion in user research scope using Simile.
Relevance to Robotics¶
- Robots in human environments need to model other agents (humans, other robots)
- Emergent behavior means the robot doesn't need explicit rules for every situation
- Memory + reflection enables long-horizon task planning
4. CrewAI¶
Type: Role-based multi-agent framework
Repo: crewaiinc/crewAI
Installation: pip install crewai crewai-tools
CrewAI takes a role-oriented approach. Each agent has a defined role, goal, and backstory — mimicking a real team.
Core Concepts¶
Agent — A role with tools and a specific goal:
from crewai import Agent
from crewai_tools import SerperDevTool, FileReadTool
researcher = Agent(
role="Senior Research Analyst",
goal="Discover cutting-edge developments in robot manipulation",
backstory=(
"You are a PhD-level robotics researcher with 10 years of experience "
"monitoring the latest papers, patents, and industry developments."
),
tools=[SerperDevTool(), FileReadTool()],
verbose=True,
)
writer = Agent(
role="Tech Writer",
goal="Write compelling technical content about robotics",
backstory=(
"You are a skilled technical writer who translates complex research "
"into accessible, well-structured articles for engineers."
),
tools=[FileReadTool()],
verbose=True,
)
Task — A unit of work assigned to an agent:
from crewai import Task
research_task = Task(
description="Research the latest advances in dexterous manipulation",
agent=researcher,
expected_output="A summary of 5 key papers with their contributions",
)
write_task = Task(
description="Write a blog post about the research findings",
agent=writer,
expected_output="A 1000-word blog post with technical accuracy",
context=[research_task], # Writer sees researcher's output
)
Crew — A team executing tasks:
from crewai import Crew, Process
crew = Crew(
agents=[researcher, writer],
tasks=[research_task, write_task],
process=Process.sequential, # or Process.hierarchical
)
result = crew.kickoff()
print(result)
Hierarchical Process (Manager Agent)¶
crew = Crew(
agents=[researcher, writer, reviewer],
tasks=[task1, task2, task3],
process=Process.hierarchical,
manager_llm=ChatOpenAI(model="gpt-4o"),
)
Custom Tools¶
from crewai.tools import BaseTool
from pydantic import Field
class MyCustomTool(BaseTool):
name: str = Field(default="my_custom_tool")
description: str = Field(default="Tool description")
def _run(self, tool_input: str) -> str:
# Implementation
return f"Result: {tool_input}"
agent = Agent(tools=[MyCustomTool()])
CrewAI vs. AutoGen vs. Swarm¶
| Feature | CrewAI | AutoGen | Swarm |
|---|---|---|---|
| Approach | Role-based team | Conversational | Handoff-based |
| Best for | Structured pipelines | Complex negotiation | Lightweight routing |
| Code execution | Via tools | Native | Via tools |
| Memory | Built-in | Via custom | None |
| Complexity | Medium | High | Low |
| Production | ✅ Growing | ✅ Mature | ❌ Experimental |
5. LangGraph¶
Type: Graph-based agent workflow framework
Repo: langchain-ai/langgraph
Installation: pip install langgraph langchain
LangGraph models agent workflows as directed graphs (DAGs). Each node is a step (agent, tool, or condition), and edges define the flow.
Core Concepts¶
State — A shared TypedDict flowing through the graph:
from typing import TypedDict, Annotated
from langgraph.graph import add_messages
class AgentState(TypedDict):
messages: Annotated[list, add_messages] # Append-only for messages
search_results: list[str]
retry_count: int
final_answer: str
Nodes — Functions that transform state:
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-4o")
def planner(state: AgentState) -> dict:
"""Planning node"""
response = llm.invoke([
{"role": "system", "content": "Create a research plan."},
*state["messages"]
])
return {"messages": [response]}
def searcher(state: AgentState) -> dict:
"""Search node"""
last_msg = state["messages"][-1].content
results = web_search(last_msg)
return {"search_results": results}
def writer(state: AgentState) -> dict:
"""Report generation node"""
context = "\n".join(state["search_results"])
response = llm.invoke([
{"role": "system", "content": f"Write a report based on:\n{context}"},
*state["messages"]
])
return {"messages": [response], "final_answer": response.content}
Conditional Edges — Route based on state:
def should_continue(state: AgentState) -> str:
if state.get("is_done"):
return "end"
elif state["retry_count"] < 2:
return "searcher"
else:
return "writer"
Build and Compile the Graph¶
from langgraph.graph import StateGraph, START, END
from langgraph.checkpoint.memory import MemorySaver
graph = StateGraph(AgentState)
graph.add_node("planner", planner)
graph.add_node("searcher", searcher)
graph.add_node("writer", writer)
graph.add_edge(START, "planner")
graph.add_edge("planner", "searcher")
graph.add_conditional_edges(
"searcher",
should_continue,
{"writer": "writer", "searcher": "searcher"}
)
graph.add_edge("writer", END)
# Add checkpointing for persistence
memory = MemorySaver()
app = graph.compile(checkpointer=memory)
# Run
config = {"configurable": {"thread_id": "research-001"}}
result = app.invoke(
{"messages": [{"role": "user", "content": "Analyze AI agent trends in 2026"}]},
config=config,
)
Human-in-the-Loop: Interrupt and Approve¶
# Interrupt before the "writer" node for human review
app = graph.compile(
checkpointer=memory,
interrupt_before=["writer"], # Pause before writing
)
# First run: executes up to "writer" and pauses
result = app.invoke(input, config=config)
# Human reviews search results
print("Search results:", result["search_results"])
# Human approves or modifies
app.update_state(config, {"search_results": result["search_results"] + ["extra"]})
# Resume execution
final = app.invoke(None, config=config) # None = resume from interrupt
Multi-Agent: Supervisor Architecture¶
from langgraph_supervisor import create_supervisor, create_react_agent
research_agent = create_react_agent(
llm, tools=[web_search, arxiv_search], name="researcher",
prompt="You are a research expert."
)
coding_agent = create_react_agent(
llm, tools=[python_repl, code_sandbox], name="coder",
prompt="You are a coding expert."
)
writing_agent = create_react_agent(
llm, tools=[], name="writer",
prompt="You are a writing expert."
)
supervisor = create_supervisor(
agents=[research_agent, coding_agent, writing_agent],
model=llm,
prompt="You are a project manager. Delegate tasks appropriately."
)
multi_agent = supervisor.compile(checkpointer=MemorySaver())
LangGraph for Robotics: Task Planning Graph¶
class RobotState(TypedDict):
command: str
plan: list[str]
current_step: int
observation: str | None
approved: bool
def parse_command(state: RobotState) -> RobotState:
steps = llm.invoke(f"Decompose: {state['command']}")
return {"plan": steps, "current_step": 0}
def execute_step(state: RobotState) -> RobotState:
step = state["plan"][state["current_step"]]
obs = robot.execute(step)
return {"observation": obs, "current_step": state["current_step"] + 1}
def should_continue(state: RobotState) -> str:
if not state.get("approved"):
return "interrupt"
return "end" if state["current_step"] >= len(state["plan"]) else "execute_step"
LangGraph vs. CrewAI vs. AutoGen¶
| Feature | LangGraph | CrewAI | AutoGen |
|---|---|---|---|
| Model | DAG / state graph | Role-based | Conversational |
| Flexibility | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐⭐ |
| Learning curve | Medium | Low | High |
| Human-in-the-loop | Native (interrupt) | Limited | Supported |
| Persistence | PostgreSQL / Redis | Limited | Via custom |
| Visualization | Built-in Mermaid | LangSmith | Limited |
| Production maturity | High | Growing | Mature |
6. Architecture Patterns Summary¶
┌─────────────────────────────────────────────────────────────┐
│ Multi-Agent Architecture Patterns │
├─────────────────────────────────────────────────────────────┤
│ │
│ Pattern 1: Sequential Pipeline │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │Researcher│───→│Writer │───→│Editor │ │
│ └─────────┘ └─────────┘ └─────────┘ │
│ (CrewAI sequential, LangGraph linear) │
│ │
│ Pattern 2: Handoff / Router │
│ ┌──────────┐ ┌─────────┐ │
│ │ Triage │────→│ Sales │ │
│ │ Agent │────→│ Support │ │
│ │ (root) │────→│ Billing │ │
│ └──────────┘ └─────────┘ │
│ (OpenAI Swarm, Hermes delegation) │
│ │
│ Pattern 3: Group Chat / Round Table │
│ ┌──────────────────────────────────────┐ │
│ │ ┌──────┐ ┌──────┐ ┌──────┐ │ │
│ │ │Agent1│ │Agent2│ │Agent3│ │ │
│ │ └──┬───┘ └──┬───┘ └──┬───┘ │ │
│ │ └─────────┼─────────┘ │ │
│ │ ▼ │ │
│ │ Group Chat │ │
│ └──────────────────────────────────────┘ │
│ (AutoGen GroupChat, MAF) │
│ │
│ Pattern 4: Hierarchical / Manager │
│ ┌────────────┐ │
│ │ Manager │ │
│ └──┬──────┬──┘ │
│ ┌───┴───┐ ┌───┴───┐ │
│ │Worker1 │ │Worker2 │ │
│ └───────┘ └───────┘ │
│ (CrewAI hierarchical, MAF supervisor) │
│ │
│ Pattern 5: Simulation / Emergent │
│ ┌───┐ ┌───┐ ┌───┐ ┌───┐ ┌───┐ │
│ │ A │ │ B │ │ C │ │ D │ │ E │ │
│ └─┬─┘ └─┬─┘ └─┬─┘ └─┬─┘ └─┬─┘ │
│ └─────┼─────┼─────┼─────┘ │
│ ▼ ▼ ▼ │
│ Shared World / Memory Streams │
│ (Stanford Generative Agents → Simile) │
│ │
└─────────────────────────────────────────────────────────────┘
7. Benchmark Performance¶
Multi-agent systems consistently outperform single agents:
| Benchmark | Single Agent | Multi-Agent | Improvement |
|---|---|---|---|
| GAIA (general AI assistant) | 40–60% | 70–85% | +25–40% |
| SWE-bench Verified (software engineering) | baseline | +25–40% | significant |
| Real-world projects (Novo Nordisk) | — | ~25% iteration cycle reduction | — |
References¶
Framework Papers & Repos¶
- OpenAI Swarm — GitHub repository
- AutoGen Paper — Microsoft Research, 2023
- Microsoft Agent Framework — Official documentation
- Generative Agents Paper — Stanford HAI, 2023
- Simile (Generative Agents startup) — $100M funded, 2026
- CrewAI Documentation — Official docs
- LangGraph Documentation — Official docs
Research & Surveys¶
- Memory in the Age of AI Agents: A Survey — NUS, Renmin U, Fudan, Peking U, 2025
- Iconiq Capital: 2025 State of AI Report — Enterprise adoption data
- AutoGen Architecture Evolution (2024–2026) — Comprehensive MAF history
- Building Effective Agents (Anthropic) — Production patterns
Hermes Agent¶
- Hermes Agent — Nous Research