Agents Based on Large Language Models¶
Large language models (LLMs) have evolved from "text generators" into the "brain" of autonomous agents. These LLM-based agents can perceive their environment, plan multi-step actions, use tools, learn from experience, and collaborate with other agents to accomplish complex tasks.
What Is an LLM-Based Agent?¶
An LLM-based agent is a system where an LLM serves as the core controller — planning, reasoning, and deciding what to do next. The agent is typically augmented with:
- Tools — API calls, code execution, web search, file I/O, browser automation, etc.
- Memory — Persistent storage of user preferences, learned facts, and session history
- Agents / Roles — Multiple specialized agents collaborating (multi-agent systems)
- Environment interaction — Perceiving and acting upon the world through available interfaces
┌─────────────────────────────────────────────────────────────┐
│ LLM-Based Agent │
├─────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Core LLM │ │
│ │ (Planning, Reasoning, Decision) │ │
│ └──────────────────────────────────────────────────────┘ │
│ │ │
│ ┌───────────────┼───────────────┐ │
│ ▼ ▼ ▼ │
│ ┌────────────┐ ┌────────────┐ ┌────────────┐ │
│ │ Tools │ │ Memory │ │ Agents │ │
│ │ web/code/ │ │ persistent │ │ multi-agent│ │
│ │ file I/O │ │ + session │ │ routing │ │
│ └────────────┘ └────────────┘ └────────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘
Why 2024–2025 Is the "Agent Era"¶
The convergence of several factors has made LLM agents mainstream:
- Stronger LLMs — GPT-4o, Claude 3.5, Gemini, DeepSeek-V3, and o1/o3 reasoning models provide better reasoning and planning
- Tool use — OpenAI's function calling, Anthropic's tool use, and native API support made tool integration standard
- Memory systems — RAG, vector databases, and agent-specific memory architectures (Forms–Functions–Dynamics) matured
- Multi-agent orchestration — Frameworks like AutoGen, CrewAI, LangGraph, and Swarm democratized multi-agent development
- Enterprise adoption — ~90% of high-growth AI startups are actively deploying or experimenting with agents (Iconiq Capital 2025 AI Report)
The Agent Loop: How Agents Think and Act¶
Every agent follows a core execution cycle. The most influential paradigm is ReAct (Reasoning + Acting):
┌─────────────────────────────────────────────────────────────┐
│ ReAct Agent Loop │
├─────────────────────────────────────────────────────────────┤
│ │
│ User Input │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ ┌──────────┐ ┌──────────┐ ┌───────────────┐ │ │
│ │ │ Think │→ │ Act │→ │ Observe │ │ │
│ │ │(Reason) │ │(Tool call│ │(Get result) │ │ │
│ │ └──────────┘ └──────────┘ └───────────────┘ │ │
│ │ ↑ │ │ │
│ │ └───────────────────────────────────────┘ │ │
│ └─────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ Final Response │
│ │
└─────────────────────────────────────────────────────────────┘
Other loop patterns include:
| Pattern | Description | Best For |
|---|---|---|
| ReAct | Interleave reasoning + acting | Open-domain QA, external information |
| Plan-and-Execute | Plan first, then execute step by step | Long-horizon tasks, report generation |
| Reflection / Self-Refine | Generate → Evaluate → Revise iteratively | High-quality text/code generation |
| Supervisor + Workers | Manager dispatches to specialists | Complex multi-step pipelines |
Core Components¶
1. Tool System¶
Agents gain real-world capabilities through tools. Each tool has: - Name and description — so the LLM knows when to use it - Parameters — JSON schema that the LLM fills in - Handler — The actual code that runs the tool
Common tool categories: shell execution, file I/O, web search, browser automation, code execution, API calls, database queries.
2. Memory System¶
Agents persist information across sessions. According to recent surveys, agent memory has evolved through three stages:
| Era | Approach | Limitation |
|---|---|---|
| 2020–2023 | Classic RAG (read-only) | Cannot learn from interactions |
| 2023–2024 | Agentic RAG (smart retrieval) | Still read-only; no learning |
| 2024+ | True Agent Memory (read + write) | Can create, update, delete; learn from experience |
Agent memory can be classified by form (token-level, parametric, latent) and by function (factual, experiential, working). See Agent Architectures for details.
3. Multi-Agent Collaboration¶
When a single agent is insufficient, multiple specialized agents collaborate:
- Supervisor / Router — Decides which agent handles a request
- Specialist agents — Each excels at one domain (coding, research, writing)
- Communication protocol — Agents pass results to each other
Chapter Roadmap¶
This section covers three interconnected topics:
| Chapter | File | Content |
|---|---|---|
| Agent Architectures | agent_architectures.md | Core patterns: tool use, memory (RAG → Agent Memory), reasoning loops, planning, security, with Hermes Agent as a concrete example |
| Multi-Agent Systems | multi_agent_systems.md | OpenAI Swarm, AutoGen/MAF, Stanford Generative Agents, CrewAI, LangGraph — with architecture comparisons and code examples |
| LLM Basics | llm_basics.md | API calling (OpenAI, Anthropic, Gemini, DeepSeek, Qwen), local deployment (llama.cpp, Ollama, vLLM, TGI), coding assistants |
Why Agents Matter for Robotics¶
Agents are becoming the key interface layer for robotic systems:
User Command ("Pick up the red cube")
│
▼
┌─────────────────┐
│ Agent (LLM) │ ← Plans: detect cube → plan grasp → execute
└────────┬────────┘
│ tool calls
▼
┌─────────────────┐
│ Perception │ ← Camera → YOLO → find red cube
└────────┬────────┘
│
▼
┌─────────────────┐
│ Manipulation │ ← Grasp planning → arm control
└────────┬────────┘
│
▼
Robot Action
An LLM-based agent serves as the task planner for a robot — translating high-level natural language commands into executable robot actions, reasoning about failures, and adapting plans dynamically.
References¶
- ReAct: Synergizing Reasoning and Acting in Language Models — Core agent loop paradigm
- Generative Agents: Interactive Simulacra of Human Behavior — Stanford Smallville
- AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation — Microsoft Research
- Memory in the Age of AI Agents: A Survey — NUS, Renmin University, Fudan (comprehensive memory taxonomy)
- OpenAI Swarm — Educational multi-agent framework
- CrewAI Documentation — Role-based multi-agent framework
- LangGraph Documentation — Graph-based agent workflow
- Iconiq Capital: 2025 State of AI — Enterprise AI adoption data
- Hermes Agent (Nous Research) — Self-improving single agent
- Building Effective Agents (Anthropic) — Production agent patterns