Agent Architectures¶

This page covers the core architectural patterns that underpin most LLM-based agents. We use Hermes Agent as a concrete case study, but these patterns apply broadly to AutoGen, CrewAI, LangGraph-based agents, and beyond.

System Overview¶

┌─────────────────────────────────────────────────────────────┐
│                   User Interfaces                           │
│        (Terminal, Telegram, Discord, Slack, Web)            │
└─────────────────────────┬───────────────────────────────────┘
                          │
                          ▼
┌─────────────────────────────────────────────────────────────┐
│                    Agent Core                                │
│    ┌─────────────────────────────────────────────────────┐  │
│    │           Agent Loop (Perceive → Plan → Act)         │  │
│    │   Build Prompt → Call LLM → Process Response         │  │
│    └─────────────────────────────────────────────────────┘  │
│                          │                                   │
│         ┌────────────────┼────────────────┐                  │
│         ▼                ▼                ▼                  │
│  ┌────────────┐  ┌────────────┐  ┌────────────┐           │
│  │   Tools    │  │   Memory   │  │ Delegation │           │
│  │ (68+ built │  │(persistent │  │(subagents) │           │
│  │    -in)    │  │  + session)│  │            │           │
│  └────────────┘  └────────────┘  └────────────┘           │
└─────────────────────────────────────────────────────────────┘

1. The Agent Loop: Five Design Patterns¶

The agent loop is the core execution engine. Recent surveys identify five dominant patterns:

Pattern 1: ReAct (Reasoning + Acting)¶

Interleaves reasoning traces with tool execution. The most widely used paradigm.

# ReAct loop: Thought → Action → Observation → repeat
messages = [{"role": "user", "content": user_input}]

for i in range(max_turns):
    # 1. Call LLM with thought prefix
    response = llm.invoke(messages)

    # 2. Parse response
    if response.tool_calls:
        for tc in response.tool_calls:
            # 3. Execute tool
            result = tools[tc.name](**tc.arguments)
            # 4. Append as observation
            messages.append({
                "role": "tool",
                "content": f"Observation: {result}",
                "tool_call_id": tc.id,
            })
    else:
        return response.content  # Done

Pattern 2: Plan-and-Execute¶

Generates a full plan first, then executes step by step. Higher long-horizon stability than ReAct.

def plan_and_execute(task: str, llm, tools):
    # Phase 1: Plan
    plan_response = llm.invoke(f"""
        Decompose this task into ordered steps:
        Task: {task}
        Return a numbered list of steps.
    """)
    steps = parse_steps(plan_response)

    # Phase 2: Execute each step
    results = []
    for step in steps:
        result = llm.invoke(f"Execute step: {step}\nContext: {results}")
        results.append(result)

    return summarize(results)

Pattern 3: Reflection / Self-Refine¶

Generates output, then an internal critic evaluates and triggers revision until quality threshold is met.

from pydantic import BaseModel, Field

class Review(BaseModel):
    score: int = Field(..., description="1-10 quality score")
    feedback: str = Field(..., description="Improvement suggestions")

structured_llm = llm.with_structured_output(Review)

def self_refine(task: str, threshold: int = 8, max_iters: int = 5):
    draft = llm.invoke(f"Complete: {task}")

    for _ in range(max_iters):
        review: Review = structured_llm.invoke(f"Score and improve:\n{draft}")
        if review.score >= threshold:
            return draft
        draft = llm.invoke(f"Improve based on: {review.feedback}\nOriginal: {draft}")

    return draft

Pattern 4: Supervisor + Workers¶

A manager agent dispatches tasks to specialized workers in parallel or sequence.

def supervisor_router(task: str, workers: list, llm):
    # Supervisor decides which worker handles the task
    decision = llm.invoke(f"""
        Task: {task}
        Available workers: {list(workers.keys())}
        Choose the best worker and explain why.
    """)

    worker_name = parse_worker_choice(decision)
    return workers[worker_name].execute(task)

Pattern 5: Router / Selector¶

A lightweight classifier routes input to different models, tools, or agent paths.

def route(task: str, llm):
    decision = llm.invoke(f"""
        Classify this task as: code | general | research | math
        Only output the category name.
        Task: {task}
    """).content.strip().lower()

    return {
        "code": code_agent,
        "general": general_agent,
        "research": research_agent,
        "math": math_agent,
    }.get(decision, general_agent).execute(task)

2. Tool System¶

Tool Definition¶

Each tool has three parts:

tool = {
    "name": "terminal",
    "description": "Execute shell commands in a terminal.",
    "parameters": {
        "type": "object",
        "properties": {
            "command": {
                "type": "string",
                "description": "The shell command to execute.",
            }
        },
        "required": ["command"],
    },
}

The LLM receives this schema in the system prompt and decides when and how to call.

Tool Dispatch¶

class ToolRegistry:
    def __init__(self):
        self.tools: dict[str, dict] = {}

    def register(self, name: str, schema: dict, handler: callable,
                 check_fn: callable = None):
        self.tools[name] = {
            "schema": schema,
            "handler": handler,
            "check_fn": check_fn,
        }

    def dispatch(self, name: str, arguments: dict) -> str:
        tool = self.tools[name]
        if tool["check_fn"] and not tool["check_fn"]():
            return "Error: Tool requirements not met."
        return tool["handler"](arguments)

# Key principle: tools never throw exceptions — only return structured results

Tool Categories¶

Category	Tools	Example
Execution	shell, python, code	Run commands, execute code
File I/O	read, write, patch	Read/write files
Web	search, extract, fetch	Browse the web
Browser	navigate, click, screenshot	Web browser automation
Vision	describe, analyze, ocr	Image understanding
Memory	save, load, search	Persistent storage
Delegation	spawn, delegate	Create subagents
Scheduling	cron, schedule	Timed tasks
Communication	send_email, send_message	Send messages

3. Memory System: From RAG to Agent Memory¶

Agent memory has evolved through three stages. Understanding this evolution is critical for building production agents.

Stage 1: Classic RAG (2020–2023)¶

Retrieval-Augmented Generation: pre-index documents in a vector database, retrieve relevant chunks at query time.

Offline: Document → Chunk → Embed → Store in VectorDB
Online:  Query → Embed → Retrieve top-k chunks → Concatenate → LLM → Response

Limitation: Read-only. Cannot learn from interactions. "A static encyclopedia."

Stage 2: Agentic RAG (2023–2024)¶

RAG becomes a tool that the agent decides when and how to use.

Agent decides:
  - Should I retrieve? From which source?
  - Is this context relevant?
  - Should I try a different search strategy?

Limitation: Still read-only. No learning from interactions.

Stage 3: True Agent Memory (2024+)¶

The agent can create, update, and delete memories — learning from every interaction.

┌─────────────────────────────────────────────────────────────┐
│              Agent Memory Architecture (2024+)                │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  ┌─────────────────────────────────────────────────────┐   │
│  │         Memory Formation → Evolution → Retrieval      │   │
│  └─────────────────────────────────────────────────────┘   │
│                          │                                   │
│  ┌─────────────────────────────────────────────────────┐   │
│  │  Forms (HOW memory is stored)                        │   │
│  │  ┌──────────────────┬──────────────────┐           │   │
│  │  │ Token-level      │ Parametric        │           │   │
│  │  │ (explicit,       │ (encoded in       │           │   │
│  │  │  searchable)    │  model weights)   │           │   │
│  │  └──────────────────┴──────────────────┘           │   │
│  │                                                      │   │
│  │  Functions (WHAT memory is used for)                 │   │
│  │  ┌──────────────────┬──────────────────┐           │   │
│  │  │ Factual Memory   │ Experiential      │           │   │
│  │  │ (user facts,     │ Memory (successes,│           │   │
│  │  │  environment)    │  skills, cases)   │           │   │
│  │  └──────────────────┴──────────────────┘           │   │
│  └─────────────────────────────────────────────────────┘   │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Memory Implementation in Hermes Agent¶

# ~/.hermes/memories/
# ├── user.json        # User profile (factual)
# ├── memory.json      # Environment facts (factual)
# └── sessions/        # Session transcripts (experiential)

def inject_memory(agent: AIAgent) -> str:
    sections = []

    # User profile
    profile = load_json("~/.hermes/user.json")
    if profile:
        sections.append(f"## User Profile\n{json.dumps(profile, indent=2)}")

    # Memory facts
    facts = load_json("~/.hermes/memory.json")
    if facts:
        sections.append(f"## Memory\n" + "\n".join(f"- {f}" for f in facts))

    return "\n\n".join(sections)

Context Compression¶

When conversations grow long, Hermes compresses history:

def compress_context(messages: list, target_ratio: float = 0.2) -> list:
    """Compress long conversation history by ~80%"""
    protected = messages[-self.protect_last_n:]  # Keep recent context
    compressible = messages[:-self.protect_last_n]

    summary = self.llm.summarize(compressible)

    return [
        {"role": "system", "content": f"Previous context summary: {summary}"},
        *protected,
    ]

4. Prompt Construction¶

The system prompt is assembled from multiple components:

def build_system_prompt(agent: AIAgent) -> str:
    sections = []

    # 1. Base persona (SOUL.md)
    sections.append(load_file("~/.hermes/SOUL.md"))

    # 2. Memory injection
    sections.append(inject_memory(agent))

    # 3. Relevant skills
    relevant_skills = load_relevant_skills(agent.user_input)
    for skill in relevant_skills:
        sections.append(f"## Skill: {skill.name}\n{skill.content}")

    # 4. Tool schemas
    tool_schemas = [t["schema"] for t in agent.tools.values()]
    sections.append(f"## Tools Available\n{json.dumps(tool_schemas)}")

    # 5. Context files (AGENTS.md, CLAUDE.md, etc.)
    sections.append(load_context_files())

    return "\n\n---\n\n".join(sections)

5. Security: Guardrails and Approval¶

Since agents can execute arbitrary commands, safety checks are critical:

Command Approval¶

import re

DANGEROUS_PATTERNS = [
    r'rm\s+-rf\s+/',           # Recursive root delete
    r'git\s+push\s+--force',   # Force push
    r'dd\s+if=',               # Direct disk write
    r'mkfs',                   # Format filesystem
    r'curl\s+.*\|\s*bash',     # Pipe to bash
]

def should_require_approval(command: str) -> bool:
    for pattern in DANGEROUS_PATTERNS:
        if re.search(pattern, command):
            return True
    return False

Output Guardrails¶

from pydantic import BaseModel, Field

class GuardrailedOutput(BaseModel):
    decision: str = Field(..., description="The decision or answer")
    reason: str = Field(..., description="Explanation")
    approved: bool = Field(..., description="Whether this should be shown to user")

structured_llm = llm.with_structured_output(GuardrailedOutput)

6. Hermes Agent: Concrete Implementation¶

Hermes Agent implements all patterns above in a production-ready system:

Component	Hermes Implementation
Agent Loop	`run_agent.py` — AIAgent class
Tools	68+ tools in `tools/` directory
Memory	`~/.hermes/memories/` (JSON + SQLite)
Skills	`~/.hermes/skills/` (Markdown with frontmatter)
Delegation	`spawn` / `delegate` tools
Compression	`agent/compression.py`
Gateway	`gateway/` (Telegram, Discord, Slack, etc.)
Credential Pool	Multiple API keys with rotation

Hermes Tool Registration¶

# tools/terminal.py
registry.register(
    name="terminal",
    toolset="terminal",
    schema={
        "name": "terminal",
        "description": "Execute shell commands",
        "parameters": {
            "type": "object",
            "properties": {
                "command": {"type": "string", "description": "Command to execute"},
                "timeout": {"type": "integer", "description": "Timeout in seconds"},
            },
            "required": ["command"],
        },
    },
    handler=terminal_handler,
    check_fn=lambda: True,
)

7. Pattern Comparison: When to Use Which¶

Pattern	Best For	Complexity	Reliability
ReAct	QA, web search, open-domain tasks	Low	Medium
Plan-then-Execute	Long reports, complex decomposition	Medium	High
Self-Refine	Code generation, creative writing	Medium	High
Supervisor + Workers	Multi-tool pipelines	Medium-High	High
Router	Multi-model cost optimization	Low	Medium

8. Applying These Patterns Across Frameworks¶

These architectural patterns transfer across frameworks:

Pattern	Hermes	AutoGen/MAF	CrewAI	LangGraph
Agent Loop	`AIAgent`	`ConversableAgent`	`Agent`	Graph nodes
Tools	`registry.register()`	`code_execution_config`	`tools=[]`	Tool nodes
Memory	JSON/SQLite	Custom	Built-in	State dict
Guardrails	Manual	`human_input_mode`	Tools	Interrupt nodes
Delegation	`spawn` tool	Nested agents	Crew hierarchy	Conditional edges
Routing	Gateway	GroupChat	Process type	`should_continue`

References¶

ReAct Paper — Reasoning + Acting paradigm
Memory in the Age of AI Agents: A Survey — Comprehensive memory taxonomy (Forms–Functions–Dynamics)
Building Effective Agents (Anthropic) — Production patterns
Hermes Agent Architecture — Concrete implementation
LangGraph Documentation — State graph patterns
LLM Agent Common Design Patterns — Pattern taxonomy