Multi-Agent Systems¶

While a single agent can handle many tasks, complex real-world problems benefit from multiple specialized agents working together. Multi-agent systems assign different roles, capabilities, and communication protocols to different agents, enabling collaboration no single agent could achieve alone.

Why Multi-Agent?¶

Scenario	Single Agent	Multi-Agent
Research + write report	Overloaded, quality drops	Researcher + Writer
Code review	Limited perspective	Coder + Reviewer + Security analyst
Customer service	One-size-fits-all	Triage + Product + Escalation
Simulate a society	Impossible	Each person = one agent

Key benefits: specialization, parallelism, division of labor, and realism.

1. OpenAI Swarm¶

Type: Lightweight multi-agent orchestration framework
Repo: openai/swarm
Released: October 2024
Focus: Handoffs between agents with minimal overhead

OpenAI Swarm is an experimental framework for multi-agent orchestration — managing handoffs and context transfer between agents, rather than relying on fixed pipelines.

Core Concepts¶

Agent — A unit with instructions (system prompt) and tools:

from swarm import Swarm, Agent

client = Swarm()

sales_agent = Agent(
    name="Sales Agent",
    instructions="You are a friendly sales assistant. Be concise and helpful.",
    tools=[lookup_product, check_inventory]
)

support_agent = Agent(
    name="Support Agent",
    instructions="You are a technical support specialist. Be thorough and accurate.",
    tools=[diagnose_issue, escalate_ticket]
)

Handoff — Transfer conversation to another agent with updated context:

def transfer_to_sales():
    """Handoff to sales agent"""
    return sales_agent

def transfer_to_support():
    """Handoff to support agent"""
    return support_agent

sales_agent.tools.append(transfer_to_support)
support_agent.tools.append(transfer_to_sales)

Key insight: Swarm uses two primitive operations — handoff and function calling. No central coordinator. The LLM decides when to hand off based on function return values.

Full Example: Customer Service Pipeline¶

from swarm import Swarm, Agent

client = Swarm()

# Triage agent routes to the right specialist
triage_agent = Agent(
    name="Triage Agent",
    instructions="""You are a customer service triage agent.
    Route users to the appropriate agent:
    - Purchasing, billing, product info → transfer_to_sales
    - Technical issues, bugs, errors → transfer_to_support
    """,
    tools=[]
)

sales_agent = Agent(
    name="Sales Agent",
    instructions="You help with purchases, billing, and product information.",
    tools=[lookup_product, process_order]
)

support_agent = Agent(
    name="Support Agent",
    instructions="You help with technical issues. Be thorough.",
    tools=[diagnose_issue, create_ticket]
)

# Add handoff tools
triage_agent.tools.extend([
    lambda: sales_agent,
    lambda: support_agent,
])

# Run
response = client.run(
    agent=triage_agent,
    messages=[{"role": "user", "content": "My robot arm is making weird noises"}]
)
print(response.messages[-1]["content"])

Design Philosophy¶

OpenAI deliberately avoids "intelligent orchestration" — letting the LLM decide the flow would make debugging a "black box." Instead, Swarm exposes the low-level primitives (handoff + function calling) so developers can see and intervene at every step.

"The difference between Swarm and writing if-else yourself is that in three years, you'll want to know why you didn't write it like Swarm." — GitHub comment

Strengths and Limitations¶

✅ Strengths	❌ Limitations
Extremely lightweight (~800 lines)	No built-in persistence
Agent-as-tool flexibility	No native multi-agent memory sharing
Easy to understand and prototype	Experimental (not production-ready)
Native OpenAI API integration	Limited error handling and recovery

2. Microsoft AutoGen → Microsoft Agent Framework (MAF)¶

Type: Multi-agent conversational framework
Papers: AutoGen (2023) | MAF (2025)
Evolution: AutoGen v0.4 (2025) → Merged with Semantic Kernel → Microsoft Agent Framework (MAF) (2025 Oct)

AutoGen pioneered the idea that agents are conversation participants. By 2025, it had evolved into Microsoft Agent Framework (MAF), combining AutoGen's multi-agent patterns with Semantic Kernel's enterprise-grade reliability.

Core Concept: ConversableAgent¶

from autogen import ConversableAgent

assistant = ConversableAgent(
    name="assistant",
    system_message="You are a helpful Python coding assistant.",
    llm_config={"model": "gpt-4o"},
)

user_proxy = ConversableAgent(
    name="user",
    human_input_mode="NEVER",  # NEVER / ALWAYS / TERMINATE
    max_consecutive_auto_reply=10,
    code_execution_config={"work_dir": "coding", "use_docker": False},
)

user_proxy.initiate_chat(
    assistant,
    message="Write a Python function to compute matrix inverse.",
)

Group Chat: Multiple Agents Discussing¶

from autogen import GroupChat, GroupChatManager

group_chat = GroupChat(
    agents=[user_proxy, researcher, critic, writer],
    messages=[],
    max_round=12,
)

manager = GroupChatManager(
    groupchat=group_chat,
    speaker_selection_method="auto",  # or "round_robin"
)

user_proxy.initiate_chat(
    manager,
    message="Write an 800-word article about SMR nuclear progress in 2026.",
)

MAF: Production-Grade Multi-Agent¶

# Microsoft Agent Framework (2025+)
from agent_framework import GroupChat, GroupChatManager, AssistantAgent

# Multi-agent group chat (AutoGen style, with persistence)
group = GroupChat(
    agents=[user_proxy, researcher, critic, writer],
    max_rounds=15,
    # New in MAF: persistent session id, checkpointing
)

manager = GroupChatManager(group=group)

await user_proxy.initiate_chat(
    manager,
    message="Research & write 600-word post on AI agent developments in 2026"
)

MAF adds deterministic DAG-based workflows alongside group chat:

# DAG workflow for order processing (deterministic, not emergent)
# Nodes: Agent | Function | Condition | Loop
# Edges: define deterministic execution paths

AutoGen vs. MAF vs. Swarm¶

Feature	AutoGen	MAF	Swarm
Communication	Conversational (message passing)	Conversational + DAG	Handoff-based
Code execution	Native (`code_execution_config`)	Native	Via tools
Group chat	Built-in `GroupChat`	Built-in + persistence	Manual
Production readiness	v0.4 mature	RC stage (2025)	Experimental
Enterprise features	Limited	Built-in (checkpointing, OpenTelemetry)	None
Language	Python	Python + .NET	Python

3. Stanford Generative Agents (Smallville)¶

Type: Simulation / research framework
Paper: Generative Agents (2023)
Demo: Stanford Smallville
Startup: Simile — raised $100M (Index Ventures, Andreessen, Lee, Karpathy)

This is a research prototype demonstrating how believable human behavior emerges from LLM-powered agents without explicit programming.

The Insight¶

Give each "person" in a virtual world: 1. A name, occupation, and personality 2. Memory streams (accumulated experiences) 3. The ability to reflect and plan

→ Agents spontaneously form relationships, coordinate activities, and exhibit emergent social behavior.

Architecture: Memory-Stream, Reflection, Planning¶

┌─────────────────────────────────────────────────────────────┐
│              Memory-Stream → Reflection → Planning           │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  ┌─────────────────────────────────────────────────────┐   │
│  │            Memory Stream (chronological)             │   │
│  │  [observation] → [observation] → [reflection] → ...  │   │
│  │  "Isabella is at the coffee shop"                    │   │
│  │  "Tom invited Isabella to a party"                  │   │
│  └─────────────────────────────────────────────────────┘   │
│                          │                                   │
│                          ▼                                   │
│  ┌─────────────────────────────────────────────────────┐   │
│  │  Reflection: synthesize observations into insights   │   │
│  │  "Isabella seems like someone who enjoys organizing  │   │
│  │   social events and bringing people together"         │   │
│  └─────────────────────────────────────────────────────┘   │
│                          │                                   │
│                          ▼                                   │
│  ┌─────────────────────────────────────────────────────┐   │
│  │  Planning: daily plan based on current state         │   │
│  │  08:00 - Wake up and have breakfast                  │   │
│  │  09:00 - Open the coffee shop                       │   │
│  │  13:00 - Have lunch at Hobbs Cafe                   │   │
│  └─────────────────────────────────────────────────────┘   │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Key Mechanisms¶

1. Perception — Agents observe the world and other agents:

[Isabella Rodriguez] observed [Tom Moreno] is currently at [The Willow Market and Deli]

2. Memory Retrieval — Retrieve relevant memories given current situation:

def retrieve_relevant(memory_stream, current_situation, k=5):
    importance = score_relevance(memory.content, current_situation)
    recency = score_recency(memory.timestamp)
    relevance = alpha * importance + beta * recency
    return top_k(memories, by=relevance, k=k)

3. Reflection — Periodically synthesize observations into high-level insights:

# Observations: "person is tired", "person is coughing", ...
# Reflection: "person might be getting sick"

4. Planning — Create a daily plan from current state and goals.

The Valentine's Day Party (Emergent Behavior)¶

One of the most famous demonstrations:

Seed: Isabella gets the instruction "You want to throw a Valentine's Day party."
Propagation: She spreads the word organically — other agents decide whether to attend based on their personalities.
Coordination: Some agents offer to help decorate; others discuss what to wear.
Result: 5 agents attend the party at the exact planned time.

No hard-coded rules. Everything emerged from the agents' memory, reflection, and planning.

Simile: From 25 to 1,000+ Agents¶

The original authors (Joon Sung Park et al.) founded Simile in 2026, scaling from 25 to 1,000+ agents simulating real human populations. Used for predicting customer behavior, brand messaging impact, and policy decisions. Wealthfront reported 15x expansion in user research scope using Simile.

Relevance to Robotics¶

Robots in human environments need to model other agents (humans, other robots)
Emergent behavior means the robot doesn't need explicit rules for every situation
Memory + reflection enables long-horizon task planning

4. CrewAI¶

Type: Role-based multi-agent framework
Repo: crewaiinc/crewAI
Installation: pip install crewai crewai-tools

CrewAI takes a role-oriented approach. Each agent has a defined role, goal, and backstory — mimicking a real team.

Core Concepts¶

Agent — A role with tools and a specific goal:

from crewai import Agent
from crewai_tools import SerperDevTool, FileReadTool

researcher = Agent(
    role="Senior Research Analyst",
    goal="Discover cutting-edge developments in robot manipulation",
    backstory=(
        "You are a PhD-level robotics researcher with 10 years of experience "
        "monitoring the latest papers, patents, and industry developments."
    ),
    tools=[SerperDevTool(), FileReadTool()],
    verbose=True,
)

writer = Agent(
    role="Tech Writer",
    goal="Write compelling technical content about robotics",
    backstory=(
        "You are a skilled technical writer who translates complex research "
        "into accessible, well-structured articles for engineers."
    ),
    tools=[FileReadTool()],
    verbose=True,
)

Task — A unit of work assigned to an agent:

from crewai import Task

research_task = Task(
    description="Research the latest advances in dexterous manipulation",
    agent=researcher,
    expected_output="A summary of 5 key papers with their contributions",
)

write_task = Task(
    description="Write a blog post about the research findings",
    agent=writer,
    expected_output="A 1000-word blog post with technical accuracy",
    context=[research_task],  # Writer sees researcher's output
)

Crew — A team executing tasks:

from crewai import Crew, Process

crew = Crew(
    agents=[researcher, writer],
    tasks=[research_task, write_task],
    process=Process.sequential,  # or Process.hierarchical
)

result = crew.kickoff()
print(result)

Hierarchical Process (Manager Agent)¶

crew = Crew(
    agents=[researcher, writer, reviewer],
    tasks=[task1, task2, task3],
    process=Process.hierarchical,
    manager_llm=ChatOpenAI(model="gpt-4o"),
)

Custom Tools¶

from crewai.tools import BaseTool
from pydantic import Field

class MyCustomTool(BaseTool):
    name: str = Field(default="my_custom_tool")
    description: str = Field(default="Tool description")

    def _run(self, tool_input: str) -> str:
        # Implementation
        return f"Result: {tool_input}"

agent = Agent(tools=[MyCustomTool()])

CrewAI vs. AutoGen vs. Swarm¶

Feature	CrewAI	AutoGen	Swarm
Approach	Role-based team	Conversational	Handoff-based
Best for	Structured pipelines	Complex negotiation	Lightweight routing
Code execution	Via tools	Native	Via tools
Memory	Built-in	Via custom	None
Complexity	Medium	High	Low
Production	✅ Growing	✅ Mature	❌ Experimental

5. LangGraph¶

Type: Graph-based agent workflow framework
Repo: langchain-ai/langgraph
Installation: pip install langgraph langchain

LangGraph models agent workflows as directed graphs (DAGs). Each node is a step (agent, tool, or condition), and edges define the flow.

Core Concepts¶

State — A shared TypedDict flowing through the graph:

from typing import TypedDict, Annotated
from langgraph.graph import add_messages

class AgentState(TypedDict):
    messages: Annotated[list, add_messages]  # Append-only for messages
    search_results: list[str]
    retry_count: int
    final_answer: str

Nodes — Functions that transform state:

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o")

def planner(state: AgentState) -> dict:
    """Planning node"""
    response = llm.invoke([
        {"role": "system", "content": "Create a research plan."},
        *state["messages"]
    ])
    return {"messages": [response]}

def searcher(state: AgentState) -> dict:
    """Search node"""
    last_msg = state["messages"][-1].content
    results = web_search(last_msg)
    return {"search_results": results}

def writer(state: AgentState) -> dict:
    """Report generation node"""
    context = "\n".join(state["search_results"])
    response = llm.invoke([
        {"role": "system", "content": f"Write a report based on:\n{context}"},
        *state["messages"]
    ])
    return {"messages": [response], "final_answer": response.content}

Conditional Edges — Route based on state:

def should_continue(state: AgentState) -> str:
    if state.get("is_done"):
        return "end"
    elif state["retry_count"] < 2:
        return "searcher"
    else:
        return "writer"

Build and Compile the Graph¶

from langgraph.graph import StateGraph, START, END
from langgraph.checkpoint.memory import MemorySaver

graph = StateGraph(AgentState)

graph.add_node("planner", planner)
graph.add_node("searcher", searcher)
graph.add_node("writer", writer)

graph.add_edge(START, "planner")
graph.add_edge("planner", "searcher")

graph.add_conditional_edges(
    "searcher",
    should_continue,
    {"writer": "writer", "searcher": "searcher"}
)

graph.add_edge("writer", END)

# Add checkpointing for persistence
memory = MemorySaver()
app = graph.compile(checkpointer=memory)

# Run
config = {"configurable": {"thread_id": "research-001"}}
result = app.invoke(
    {"messages": [{"role": "user", "content": "Analyze AI agent trends in 2026"}]},
    config=config,
)

Human-in-the-Loop: Interrupt and Approve¶

# Interrupt before the "writer" node for human review
app = graph.compile(
    checkpointer=memory,
    interrupt_before=["writer"],  # Pause before writing
)

# First run: executes up to "writer" and pauses
result = app.invoke(input, config=config)

# Human reviews search results
print("Search results:", result["search_results"])

# Human approves or modifies
app.update_state(config, {"search_results": result["search_results"] + ["extra"]})

# Resume execution
final = app.invoke(None, config=config)  # None = resume from interrupt

Multi-Agent: Supervisor Architecture¶

from langgraph_supervisor import create_supervisor, create_react_agent

research_agent = create_react_agent(
    llm, tools=[web_search, arxiv_search], name="researcher",
    prompt="You are a research expert."
)
coding_agent = create_react_agent(
    llm, tools=[python_repl, code_sandbox], name="coder",
    prompt="You are a coding expert."
)
writing_agent = create_react_agent(
    llm, tools=[], name="writer",
    prompt="You are a writing expert."
)

supervisor = create_supervisor(
    agents=[research_agent, coding_agent, writing_agent],
    model=llm,
    prompt="You are a project manager. Delegate tasks appropriately."
)

multi_agent = supervisor.compile(checkpointer=MemorySaver())

LangGraph for Robotics: Task Planning Graph¶

class RobotState(TypedDict):
    command: str
    plan: list[str]
    current_step: int
    observation: str | None
    approved: bool

def parse_command(state: RobotState) -> RobotState:
    steps = llm.invoke(f"Decompose: {state['command']}")
    return {"plan": steps, "current_step": 0}

def execute_step(state: RobotState) -> RobotState:
    step = state["plan"][state["current_step"]]
    obs = robot.execute(step)
    return {"observation": obs, "current_step": state["current_step"] + 1}

def should_continue(state: RobotState) -> str:
    if not state.get("approved"):
        return "interrupt"
    return "end" if state["current_step"] >= len(state["plan"]) else "execute_step"

LangGraph vs. CrewAI vs. AutoGen¶

Feature	LangGraph	CrewAI	AutoGen
Model	DAG / state graph	Role-based	Conversational
Flexibility	⭐⭐⭐⭐⭐	⭐⭐⭐	⭐⭐⭐⭐
Learning curve	Medium	Low	High
Human-in-the-loop	Native (interrupt)	Limited	Supported
Persistence	PostgreSQL / Redis	Limited	Via custom
Visualization	Built-in Mermaid	LangSmith	Limited
Production maturity	High	Growing	Mature

6. Architecture Patterns Summary¶

┌─────────────────────────────────────────────────────────────┐
│            Multi-Agent Architecture Patterns                 │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  Pattern 1: Sequential Pipeline                             │
│  ┌─────────┐    ┌─────────┐    ┌─────────┐               │
│  │Researcher│───→│Writer  │───→│Editor   │               │
│  └─────────┘    └─────────┘    └─────────┘               │
│  (CrewAI sequential, LangGraph linear)                       │
│                                                             │
│  Pattern 2: Handoff / Router                               │
│  ┌──────────┐     ┌─────────┐                             │
│  │  Triage  │────→│ Sales   │                             │
│  │  Agent   │────→│ Support │                             │
│  │  (root)  │────→│ Billing │                             │
│  └──────────┘     └─────────┘                             │
│  (OpenAI Swarm, Hermes delegation)                        │
│                                                             │
│  Pattern 3: Group Chat / Round Table                     │
│       ┌──────────────────────────────────────┐              │
│       │  ┌──────┐  ┌──────┐  ┌──────┐       │              │
│       │  │Agent1│  │Agent2│  │Agent3│       │              │
│       │  └──┬───┘  └──┬───┘  └──┬───┘       │              │
│       │     └─────────┼─────────┘            │              │
│       │               ▼                      │              │
│       │         Group Chat                    │              │
│       └──────────────────────────────────────┘              │
│  (AutoGen GroupChat, MAF)                                    │
│                                                             │
│  Pattern 4: Hierarchical / Manager                         │
│       ┌────────────┐                                        │
│       │  Manager   │                                        │
│       └──┬──────┬──┘                                        │
│       ┌───┴───┐  ┌───┴───┐                                  │
│       │Worker1 │  │Worker2 │                                 │
│       └───────┘  └───────┘                                  │
│  (CrewAI hierarchical, MAF supervisor)                       │
│                                                             │
│  Pattern 5: Simulation / Emergent                         │
│       ┌───┐ ┌───┐ ┌───┐ ┌───┐ ┌───┐                      │
│       │ A │ │ B │ │ C │ │ D │ │ E │                      │
│       └─┬─┘ └─┬─┘ └─┬─┘ └─┬─┘ └─┬─┘                        │
│         └─────┼─────┼─────┼─────┘                         │
│               ▼     ▼     ▼                                 │
│         Shared World / Memory Streams                      │
│  (Stanford Generative Agents → Simile)                     │
│                                                             │
└─────────────────────────────────────────────────────────────┘

7. Benchmark Performance¶

Multi-agent systems consistently outperform single agents:

Benchmark	Single Agent	Multi-Agent	Improvement
GAIA (general AI assistant)	40–60%	70–85%	+25–40%
SWE-bench Verified (software engineering)	baseline	+25–40%	significant
Real-world projects (Novo Nordisk)	—	~25% iteration cycle reduction	—

References¶

Framework Papers & Repos¶

OpenAI Swarm — GitHub repository
AutoGen Paper — Microsoft Research, 2023
Microsoft Agent Framework — Official documentation
Generative Agents Paper — Stanford HAI, 2023
Simile (Generative Agents startup) — $100M funded, 2026
CrewAI Documentation — Official docs
LangGraph Documentation — Official docs

Research & Surveys¶

Memory in the Age of AI Agents: A Survey — NUS, Renmin U, Fudan, Peking U, 2025
Iconiq Capital: 2025 State of AI Report — Enterprise adoption data
AutoGen Architecture Evolution (2024–2026) — Comprehensive MAF history
Building Effective Agents (Anthropic) — Production patterns

Hermes Agent¶

Hermes Agent — Nous Research

Multi-Agent Systems¶

Why Multi-Agent?¶

1. OpenAI Swarm¶

Core Concepts¶

Full Example: Customer Service Pipeline¶

Design Philosophy¶

Strengths and Limitations¶

2. Microsoft AutoGen → Microsoft Agent Framework (MAF)¶

Core Concept: ConversableAgent¶

Group Chat: Multiple Agents Discussing¶

MAF: Production-Grade Multi-Agent¶

AutoGen vs. MAF vs. Swarm¶

3. Stanford Generative Agents (Smallville)¶

The Insight¶

Architecture: Memory-Stream, Reflection, Planning¶

Key Mechanisms¶

The Valentine's Day Party (Emergent Behavior)¶

Simile: From 25 to 1,000+ Agents¶

Relevance to Robotics¶

4. CrewAI¶

Core Concepts¶

Hierarchical Process (Manager Agent)¶

Custom Tools¶

CrewAI vs. AutoGen vs. Swarm¶

5. LangGraph¶

Core Concepts¶

Build and Compile the Graph¶

Human-in-the-Loop: Interrupt and Approve¶

Multi-Agent: Supervisor Architecture¶

LangGraph for Robotics: Task Planning Graph¶

LangGraph vs. CrewAI vs. AutoGen¶

6. Architecture Patterns Summary¶

7. Benchmark Performance¶

References¶

Framework Papers & Repos¶

Research & Surveys¶

Hermes Agent¶

Robotics Course Docs

Learn

Build

Community