Skip to content

Agent Architectures

This page covers the core architectural patterns that underpin most LLM-based agents. We use Hermes Agent as a concrete case study, but these patterns apply broadly to AutoGen, CrewAI, LangGraph-based agents, and beyond.

System Overview

┌─────────────────────────────────────────────────────────────┐
│                   User Interfaces                           │
│        (Terminal, Telegram, Discord, Slack, Web)            │
└─────────────────────────┬───────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│                    Agent Core                                │
│    ┌─────────────────────────────────────────────────────┐  │
│    │           Agent Loop (Perceive → Plan → Act)         │  │
│    │   Build Prompt → Call LLM → Process Response         │  │
│    └─────────────────────────────────────────────────────┘  │
│                          │                                   │
│         ┌────────────────┼────────────────┐                  │
│         ▼                ▼                ▼                  │
│  ┌────────────┐  ┌────────────┐  ┌────────────┐           │
│  │   Tools    │  │   Memory   │  │ Delegation │           │
│  │ (68+ built │  │(persistent │  │(subagents) │           │
│  │    -in)    │  │  + session)│  │            │           │
│  └────────────┘  └────────────┘  └────────────┘           │
└─────────────────────────────────────────────────────────────┘

1. The Agent Loop: Five Design Patterns

The agent loop is the core execution engine. Recent surveys identify five dominant patterns:

Pattern 1: ReAct (Reasoning + Acting)

Interleaves reasoning traces with tool execution. The most widely used paradigm.

# ReAct loop: Thought → Action → Observation → repeat
messages = [{"role": "user", "content": user_input}]

for i in range(max_turns):
    # 1. Call LLM with thought prefix
    response = llm.invoke(messages)

    # 2. Parse response
    if response.tool_calls:
        for tc in response.tool_calls:
            # 3. Execute tool
            result = tools[tc.name](**tc.arguments)
            # 4. Append as observation
            messages.append({
                "role": "tool",
                "content": f"Observation: {result}",
                "tool_call_id": tc.id,
            })
    else:
        return response.content  # Done

Pattern 2: Plan-and-Execute

Generates a full plan first, then executes step by step. Higher long-horizon stability than ReAct.

def plan_and_execute(task: str, llm, tools):
    # Phase 1: Plan
    plan_response = llm.invoke(f"""
        Decompose this task into ordered steps:
        Task: {task}
        Return a numbered list of steps.
    """)
    steps = parse_steps(plan_response)

    # Phase 2: Execute each step
    results = []
    for step in steps:
        result = llm.invoke(f"Execute step: {step}\nContext: {results}")
        results.append(result)

    return summarize(results)

Pattern 3: Reflection / Self-Refine

Generates output, then an internal critic evaluates and triggers revision until quality threshold is met.

from pydantic import BaseModel, Field

class Review(BaseModel):
    score: int = Field(..., description="1-10 quality score")
    feedback: str = Field(..., description="Improvement suggestions")

structured_llm = llm.with_structured_output(Review)

def self_refine(task: str, threshold: int = 8, max_iters: int = 5):
    draft = llm.invoke(f"Complete: {task}")

    for _ in range(max_iters):
        review: Review = structured_llm.invoke(f"Score and improve:\n{draft}")
        if review.score >= threshold:
            return draft
        draft = llm.invoke(f"Improve based on: {review.feedback}\nOriginal: {draft}")

    return draft

Pattern 4: Supervisor + Workers

A manager agent dispatches tasks to specialized workers in parallel or sequence.

def supervisor_router(task: str, workers: list, llm):
    # Supervisor decides which worker handles the task
    decision = llm.invoke(f"""
        Task: {task}
        Available workers: {list(workers.keys())}
        Choose the best worker and explain why.
    """)

    worker_name = parse_worker_choice(decision)
    return workers[worker_name].execute(task)

Pattern 5: Router / Selector

A lightweight classifier routes input to different models, tools, or agent paths.

def route(task: str, llm):
    decision = llm.invoke(f"""
        Classify this task as: code | general | research | math
        Only output the category name.
        Task: {task}
    """).content.strip().lower()

    return {
        "code": code_agent,
        "general": general_agent,
        "research": research_agent,
        "math": math_agent,
    }.get(decision, general_agent).execute(task)

2. Tool System

Tool Definition

Each tool has three parts:

tool = {
    "name": "terminal",
    "description": "Execute shell commands in a terminal.",
    "parameters": {
        "type": "object",
        "properties": {
            "command": {
                "type": "string",
                "description": "The shell command to execute.",
            }
        },
        "required": ["command"],
    },
}

The LLM receives this schema in the system prompt and decides when and how to call.

Tool Dispatch

class ToolRegistry:
    def __init__(self):
        self.tools: dict[str, dict] = {}

    def register(self, name: str, schema: dict, handler: callable,
                 check_fn: callable = None):
        self.tools[name] = {
            "schema": schema,
            "handler": handler,
            "check_fn": check_fn,
        }

    def dispatch(self, name: str, arguments: dict) -> str:
        tool = self.tools[name]
        if tool["check_fn"] and not tool["check_fn"]():
            return "Error: Tool requirements not met."
        return tool["handler"](arguments)

# Key principle: tools never throw exceptions — only return structured results

Tool Categories

Category Tools Example
Execution shell, python, code Run commands, execute code
File I/O read, write, patch Read/write files
Web search, extract, fetch Browse the web
Browser navigate, click, screenshot Web browser automation
Vision describe, analyze, ocr Image understanding
Memory save, load, search Persistent storage
Delegation spawn, delegate Create subagents
Scheduling cron, schedule Timed tasks
Communication send_email, send_message Send messages

3. Memory System: From RAG to Agent Memory

Agent memory has evolved through three stages. Understanding this evolution is critical for building production agents.

Stage 1: Classic RAG (2020–2023)

Retrieval-Augmented Generation: pre-index documents in a vector database, retrieve relevant chunks at query time.

Offline: Document → Chunk → Embed → Store in VectorDB
Online:  Query → Embed → Retrieve top-k chunks → Concatenate → LLM → Response

Limitation: Read-only. Cannot learn from interactions. "A static encyclopedia."

Stage 2: Agentic RAG (2023–2024)

RAG becomes a tool that the agent decides when and how to use.

Agent decides:
  - Should I retrieve? From which source?
  - Is this context relevant?
  - Should I try a different search strategy?

Limitation: Still read-only. No learning from interactions.

Stage 3: True Agent Memory (2024+)

The agent can create, update, and delete memories — learning from every interaction.

┌─────────────────────────────────────────────────────────────┐
│              Agent Memory Architecture (2024+)                │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  ┌─────────────────────────────────────────────────────┐   │
│  │         Memory Formation → Evolution → Retrieval      │   │
│  └─────────────────────────────────────────────────────┘   │
│                          │                                   │
│  ┌─────────────────────────────────────────────────────┐   │
│  │  Forms (HOW memory is stored)                        │   │
│  │  ┌──────────────────┬──────────────────┐           │   │
│  │  │ Token-level      │ Parametric        │           │   │
│  │  │ (explicit,       │ (encoded in       │           │   │
│  │  │  searchable)    │  model weights)   │           │   │
│  │  └──────────────────┴──────────────────┘           │   │
│  │                                                      │   │
│  │  Functions (WHAT memory is used for)                 │   │
│  │  ┌──────────────────┬──────────────────┐           │   │
│  │  │ Factual Memory   │ Experiential      │           │   │
│  │  │ (user facts,     │ Memory (successes,│           │   │
│  │  │  environment)    │  skills, cases)   │           │   │
│  │  └──────────────────┴──────────────────┘           │   │
│  └─────────────────────────────────────────────────────┘   │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Memory Implementation in Hermes Agent

# ~/.hermes/memories/
# ├── user.json        # User profile (factual)
# ├── memory.json      # Environment facts (factual)
# └── sessions/        # Session transcripts (experiential)

def inject_memory(agent: AIAgent) -> str:
    sections = []

    # User profile
    profile = load_json("~/.hermes/user.json")
    if profile:
        sections.append(f"## User Profile\n{json.dumps(profile, indent=2)}")

    # Memory facts
    facts = load_json("~/.hermes/memory.json")
    if facts:
        sections.append(f"## Memory\n" + "\n".join(f"- {f}" for f in facts))

    return "\n\n".join(sections)

Context Compression

When conversations grow long, Hermes compresses history:

def compress_context(messages: list, target_ratio: float = 0.2) -> list:
    """Compress long conversation history by ~80%"""
    protected = messages[-self.protect_last_n:]  # Keep recent context
    compressible = messages[:-self.protect_last_n]

    summary = self.llm.summarize(compressible)

    return [
        {"role": "system", "content": f"Previous context summary: {summary}"},
        *protected,
    ]

4. Prompt Construction

The system prompt is assembled from multiple components:

def build_system_prompt(agent: AIAgent) -> str:
    sections = []

    # 1. Base persona (SOUL.md)
    sections.append(load_file("~/.hermes/SOUL.md"))

    # 2. Memory injection
    sections.append(inject_memory(agent))

    # 3. Relevant skills
    relevant_skills = load_relevant_skills(agent.user_input)
    for skill in relevant_skills:
        sections.append(f"## Skill: {skill.name}\n{skill.content}")

    # 4. Tool schemas
    tool_schemas = [t["schema"] for t in agent.tools.values()]
    sections.append(f"## Tools Available\n{json.dumps(tool_schemas)}")

    # 5. Context files (AGENTS.md, CLAUDE.md, etc.)
    sections.append(load_context_files())

    return "\n\n---\n\n".join(sections)

5. Security: Guardrails and Approval

Since agents can execute arbitrary commands, safety checks are critical:

Command Approval

import re

DANGEROUS_PATTERNS = [
    r'rm\s+-rf\s+/',           # Recursive root delete
    r'git\s+push\s+--force',   # Force push
    r'dd\s+if=',               # Direct disk write
    r'mkfs',                   # Format filesystem
    r'curl\s+.*\|\s*bash',     # Pipe to bash
]

def should_require_approval(command: str) -> bool:
    for pattern in DANGEROUS_PATTERNS:
        if re.search(pattern, command):
            return True
    return False

Output Guardrails

from pydantic import BaseModel, Field

class GuardrailedOutput(BaseModel):
    decision: str = Field(..., description="The decision or answer")
    reason: str = Field(..., description="Explanation")
    approved: bool = Field(..., description="Whether this should be shown to user")

structured_llm = llm.with_structured_output(GuardrailedOutput)

6. Hermes Agent: Concrete Implementation

Hermes Agent implements all patterns above in a production-ready system:

Component Hermes Implementation
Agent Loop run_agent.py — AIAgent class
Tools 68+ tools in tools/ directory
Memory ~/.hermes/memories/ (JSON + SQLite)
Skills ~/.hermes/skills/ (Markdown with frontmatter)
Delegation spawn / delegate tools
Compression agent/compression.py
Gateway gateway/ (Telegram, Discord, Slack, etc.)
Credential Pool Multiple API keys with rotation

Hermes Tool Registration

# tools/terminal.py
registry.register(
    name="terminal",
    toolset="terminal",
    schema={
        "name": "terminal",
        "description": "Execute shell commands",
        "parameters": {
            "type": "object",
            "properties": {
                "command": {"type": "string", "description": "Command to execute"},
                "timeout": {"type": "integer", "description": "Timeout in seconds"},
            },
            "required": ["command"],
        },
    },
    handler=terminal_handler,
    check_fn=lambda: True,
)

7. Pattern Comparison: When to Use Which

Pattern Best For Complexity Reliability
ReAct QA, web search, open-domain tasks Low Medium
Plan-then-Execute Long reports, complex decomposition Medium High
Self-Refine Code generation, creative writing Medium High
Supervisor + Workers Multi-tool pipelines Medium-High High
Router Multi-model cost optimization Low Medium

8. Applying These Patterns Across Frameworks

These architectural patterns transfer across frameworks:

Pattern Hermes AutoGen/MAF CrewAI LangGraph
Agent Loop AIAgent ConversableAgent Agent Graph nodes
Tools registry.register() code_execution_config tools=[] Tool nodes
Memory JSON/SQLite Custom Built-in State dict
Guardrails Manual human_input_mode Tools Interrupt nodes
Delegation spawn tool Nested agents Crew hierarchy Conditional edges
Routing Gateway GroupChat Process type should_continue

References