Agent Architectures¶
This page covers the core architectural patterns that underpin most LLM-based agents. We use Hermes Agent as a concrete case study, but these patterns apply broadly to AutoGen, CrewAI, LangGraph-based agents, and beyond.
System Overview¶
┌─────────────────────────────────────────────────────────────┐
│ User Interfaces │
│ (Terminal, Telegram, Discord, Slack, Web) │
└─────────────────────────┬───────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Agent Core │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ Agent Loop (Perceive → Plan → Act) │ │
│ │ Build Prompt → Call LLM → Process Response │ │
│ └─────────────────────────────────────────────────────┘ │
│ │ │
│ ┌────────────────┼────────────────┐ │
│ ▼ ▼ ▼ │
│ ┌────────────┐ ┌────────────┐ ┌────────────┐ │
│ │ Tools │ │ Memory │ │ Delegation │ │
│ │ (68+ built │ │(persistent │ │(subagents) │ │
│ │ -in) │ │ + session)│ │ │ │
│ └────────────┘ └────────────┘ └────────────┘ │
└─────────────────────────────────────────────────────────────┘
1. The Agent Loop: Five Design Patterns¶
The agent loop is the core execution engine. Recent surveys identify five dominant patterns:
Pattern 1: ReAct (Reasoning + Acting)¶
Interleaves reasoning traces with tool execution. The most widely used paradigm.
# ReAct loop: Thought → Action → Observation → repeat
messages = [{"role": "user", "content": user_input}]
for i in range(max_turns):
# 1. Call LLM with thought prefix
response = llm.invoke(messages)
# 2. Parse response
if response.tool_calls:
for tc in response.tool_calls:
# 3. Execute tool
result = tools[tc.name](**tc.arguments)
# 4. Append as observation
messages.append({
"role": "tool",
"content": f"Observation: {result}",
"tool_call_id": tc.id,
})
else:
return response.content # Done
Pattern 2: Plan-and-Execute¶
Generates a full plan first, then executes step by step. Higher long-horizon stability than ReAct.
def plan_and_execute(task: str, llm, tools):
# Phase 1: Plan
plan_response = llm.invoke(f"""
Decompose this task into ordered steps:
Task: {task}
Return a numbered list of steps.
""")
steps = parse_steps(plan_response)
# Phase 2: Execute each step
results = []
for step in steps:
result = llm.invoke(f"Execute step: {step}\nContext: {results}")
results.append(result)
return summarize(results)
Pattern 3: Reflection / Self-Refine¶
Generates output, then an internal critic evaluates and triggers revision until quality threshold is met.
from pydantic import BaseModel, Field
class Review(BaseModel):
score: int = Field(..., description="1-10 quality score")
feedback: str = Field(..., description="Improvement suggestions")
structured_llm = llm.with_structured_output(Review)
def self_refine(task: str, threshold: int = 8, max_iters: int = 5):
draft = llm.invoke(f"Complete: {task}")
for _ in range(max_iters):
review: Review = structured_llm.invoke(f"Score and improve:\n{draft}")
if review.score >= threshold:
return draft
draft = llm.invoke(f"Improve based on: {review.feedback}\nOriginal: {draft}")
return draft
Pattern 4: Supervisor + Workers¶
A manager agent dispatches tasks to specialized workers in parallel or sequence.
def supervisor_router(task: str, workers: list, llm):
# Supervisor decides which worker handles the task
decision = llm.invoke(f"""
Task: {task}
Available workers: {list(workers.keys())}
Choose the best worker and explain why.
""")
worker_name = parse_worker_choice(decision)
return workers[worker_name].execute(task)
Pattern 5: Router / Selector¶
A lightweight classifier routes input to different models, tools, or agent paths.
def route(task: str, llm):
decision = llm.invoke(f"""
Classify this task as: code | general | research | math
Only output the category name.
Task: {task}
""").content.strip().lower()
return {
"code": code_agent,
"general": general_agent,
"research": research_agent,
"math": math_agent,
}.get(decision, general_agent).execute(task)
2. Tool System¶
Tool Definition¶
Each tool has three parts:
tool = {
"name": "terminal",
"description": "Execute shell commands in a terminal.",
"parameters": {
"type": "object",
"properties": {
"command": {
"type": "string",
"description": "The shell command to execute.",
}
},
"required": ["command"],
},
}
The LLM receives this schema in the system prompt and decides when and how to call.
Tool Dispatch¶
class ToolRegistry:
def __init__(self):
self.tools: dict[str, dict] = {}
def register(self, name: str, schema: dict, handler: callable,
check_fn: callable = None):
self.tools[name] = {
"schema": schema,
"handler": handler,
"check_fn": check_fn,
}
def dispatch(self, name: str, arguments: dict) -> str:
tool = self.tools[name]
if tool["check_fn"] and not tool["check_fn"]():
return "Error: Tool requirements not met."
return tool["handler"](arguments)
# Key principle: tools never throw exceptions — only return structured results
Tool Categories¶
| Category | Tools | Example |
|---|---|---|
| Execution | shell, python, code | Run commands, execute code |
| File I/O | read, write, patch | Read/write files |
| Web | search, extract, fetch | Browse the web |
| Browser | navigate, click, screenshot | Web browser automation |
| Vision | describe, analyze, ocr | Image understanding |
| Memory | save, load, search | Persistent storage |
| Delegation | spawn, delegate | Create subagents |
| Scheduling | cron, schedule | Timed tasks |
| Communication | send_email, send_message | Send messages |
3. Memory System: From RAG to Agent Memory¶
Agent memory has evolved through three stages. Understanding this evolution is critical for building production agents.
Stage 1: Classic RAG (2020–2023)¶
Retrieval-Augmented Generation: pre-index documents in a vector database, retrieve relevant chunks at query time.
Offline: Document → Chunk → Embed → Store in VectorDB
Online: Query → Embed → Retrieve top-k chunks → Concatenate → LLM → Response
Limitation: Read-only. Cannot learn from interactions. "A static encyclopedia."
Stage 2: Agentic RAG (2023–2024)¶
RAG becomes a tool that the agent decides when and how to use.
Agent decides:
- Should I retrieve? From which source?
- Is this context relevant?
- Should I try a different search strategy?
Limitation: Still read-only. No learning from interactions.
Stage 3: True Agent Memory (2024+)¶
The agent can create, update, and delete memories — learning from every interaction.
┌─────────────────────────────────────────────────────────────┐
│ Agent Memory Architecture (2024+) │
├─────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ Memory Formation → Evolution → Retrieval │ │
│ └─────────────────────────────────────────────────────┘ │
│ │ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ Forms (HOW memory is stored) │ │
│ │ ┌──────────────────┬──────────────────┐ │ │
│ │ │ Token-level │ Parametric │ │ │
│ │ │ (explicit, │ (encoded in │ │ │
│ │ │ searchable) │ model weights) │ │ │
│ │ └──────────────────┴──────────────────┘ │ │
│ │ │ │
│ │ Functions (WHAT memory is used for) │ │
│ │ ┌──────────────────┬──────────────────┐ │ │
│ │ │ Factual Memory │ Experiential │ │ │
│ │ │ (user facts, │ Memory (successes,│ │ │
│ │ │ environment) │ skills, cases) │ │ │
│ │ └──────────────────┴──────────────────┘ │ │
│ └─────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘
Memory Implementation in Hermes Agent¶
# ~/.hermes/memories/
# ├── user.json # User profile (factual)
# ├── memory.json # Environment facts (factual)
# └── sessions/ # Session transcripts (experiential)
def inject_memory(agent: AIAgent) -> str:
sections = []
# User profile
profile = load_json("~/.hermes/user.json")
if profile:
sections.append(f"## User Profile\n{json.dumps(profile, indent=2)}")
# Memory facts
facts = load_json("~/.hermes/memory.json")
if facts:
sections.append(f"## Memory\n" + "\n".join(f"- {f}" for f in facts))
return "\n\n".join(sections)
Context Compression¶
When conversations grow long, Hermes compresses history:
def compress_context(messages: list, target_ratio: float = 0.2) -> list:
"""Compress long conversation history by ~80%"""
protected = messages[-self.protect_last_n:] # Keep recent context
compressible = messages[:-self.protect_last_n]
summary = self.llm.summarize(compressible)
return [
{"role": "system", "content": f"Previous context summary: {summary}"},
*protected,
]
4. Prompt Construction¶
The system prompt is assembled from multiple components:
def build_system_prompt(agent: AIAgent) -> str:
sections = []
# 1. Base persona (SOUL.md)
sections.append(load_file("~/.hermes/SOUL.md"))
# 2. Memory injection
sections.append(inject_memory(agent))
# 3. Relevant skills
relevant_skills = load_relevant_skills(agent.user_input)
for skill in relevant_skills:
sections.append(f"## Skill: {skill.name}\n{skill.content}")
# 4. Tool schemas
tool_schemas = [t["schema"] for t in agent.tools.values()]
sections.append(f"## Tools Available\n{json.dumps(tool_schemas)}")
# 5. Context files (AGENTS.md, CLAUDE.md, etc.)
sections.append(load_context_files())
return "\n\n---\n\n".join(sections)
5. Security: Guardrails and Approval¶
Since agents can execute arbitrary commands, safety checks are critical:
Command Approval¶
import re
DANGEROUS_PATTERNS = [
r'rm\s+-rf\s+/', # Recursive root delete
r'git\s+push\s+--force', # Force push
r'dd\s+if=', # Direct disk write
r'mkfs', # Format filesystem
r'curl\s+.*\|\s*bash', # Pipe to bash
]
def should_require_approval(command: str) -> bool:
for pattern in DANGEROUS_PATTERNS:
if re.search(pattern, command):
return True
return False
Output Guardrails¶
from pydantic import BaseModel, Field
class GuardrailedOutput(BaseModel):
decision: str = Field(..., description="The decision or answer")
reason: str = Field(..., description="Explanation")
approved: bool = Field(..., description="Whether this should be shown to user")
structured_llm = llm.with_structured_output(GuardrailedOutput)
6. Hermes Agent: Concrete Implementation¶
Hermes Agent implements all patterns above in a production-ready system:
| Component | Hermes Implementation |
|---|---|
| Agent Loop | run_agent.py — AIAgent class |
| Tools | 68+ tools in tools/ directory |
| Memory | ~/.hermes/memories/ (JSON + SQLite) |
| Skills | ~/.hermes/skills/ (Markdown with frontmatter) |
| Delegation | spawn / delegate tools |
| Compression | agent/compression.py |
| Gateway | gateway/ (Telegram, Discord, Slack, etc.) |
| Credential Pool | Multiple API keys with rotation |
Hermes Tool Registration¶
# tools/terminal.py
registry.register(
name="terminal",
toolset="terminal",
schema={
"name": "terminal",
"description": "Execute shell commands",
"parameters": {
"type": "object",
"properties": {
"command": {"type": "string", "description": "Command to execute"},
"timeout": {"type": "integer", "description": "Timeout in seconds"},
},
"required": ["command"],
},
},
handler=terminal_handler,
check_fn=lambda: True,
)
7. Pattern Comparison: When to Use Which¶
| Pattern | Best For | Complexity | Reliability |
|---|---|---|---|
| ReAct | QA, web search, open-domain tasks | Low | Medium |
| Plan-then-Execute | Long reports, complex decomposition | Medium | High |
| Self-Refine | Code generation, creative writing | Medium | High |
| Supervisor + Workers | Multi-tool pipelines | Medium-High | High |
| Router | Multi-model cost optimization | Low | Medium |
8. Applying These Patterns Across Frameworks¶
These architectural patterns transfer across frameworks:
| Pattern | Hermes | AutoGen/MAF | CrewAI | LangGraph |
|---|---|---|---|---|
| Agent Loop | AIAgent |
ConversableAgent |
Agent |
Graph nodes |
| Tools | registry.register() |
code_execution_config |
tools=[] |
Tool nodes |
| Memory | JSON/SQLite | Custom | Built-in | State dict |
| Guardrails | Manual | human_input_mode |
Tools | Interrupt nodes |
| Delegation | spawn tool |
Nested agents | Crew hierarchy | Conditional edges |
| Routing | Gateway | GroupChat | Process type | should_continue |
References¶
- ReAct Paper — Reasoning + Acting paradigm
- Memory in the Age of AI Agents: A Survey — Comprehensive memory taxonomy (Forms–Functions–Dynamics)
- Building Effective Agents (Anthropic) — Production patterns
- Hermes Agent Architecture — Concrete implementation
- LangGraph Documentation — State graph patterns
- LLM Agent Common Design Patterns — Pattern taxonomy