Agents Based on Large Language Models¶

Large language models (LLMs) have evolved from "text generators" into the "brain" of autonomous agents. These LLM-based agents can perceive their environment, plan multi-step actions, use tools, learn from experience, and collaborate with other agents to accomplish complex tasks.

What Is an LLM-Based Agent?¶

An LLM-based agent is a system where an LLM serves as the core controller — planning, reasoning, and deciding what to do next. The agent is typically augmented with:

Tools — API calls, code execution, web search, file I/O, browser automation, etc.
Memory — Persistent storage of user preferences, learned facts, and session history
Agents / Roles — Multiple specialized agents collaborating (multi-agent systems)
Environment interaction — Perceiving and acting upon the world through available interfaces

┌─────────────────────────────────────────────────────────────┐
│                     LLM-Based Agent                          │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│   ┌──────────────────────────────────────────────────────┐  │
│   │                   Core LLM                             │  │
│   │          (Planning, Reasoning, Decision)              │  │
│   └──────────────────────────────────────────────────────┘  │
│                          │                                   │
│          ┌───────────────┼───────────────┐                  │
│          ▼               ▼               ▼                   │
│   ┌────────────┐  ┌────────────┐  ┌────────────┐          │
│   │   Tools    │  │   Memory   │  │  Agents    │          │
│   │ web/code/  │  │ persistent │  │ multi-agent│          │
│   │  file I/O  │  │  + session │  │  routing   │          │
│   └────────────┘  └────────────┘  └────────────┘          │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Why 2024–2025 Is the "Agent Era"¶

The convergence of several factors has made LLM agents mainstream:

Stronger LLMs — GPT-4o, Claude 3.5, Gemini, DeepSeek-V3, and o1/o3 reasoning models provide better reasoning and planning
Tool use — OpenAI's function calling, Anthropic's tool use, and native API support made tool integration standard
Memory systems — RAG, vector databases, and agent-specific memory architectures (Forms–Functions–Dynamics) matured
Multi-agent orchestration — Frameworks like AutoGen, CrewAI, LangGraph, and Swarm democratized multi-agent development
Enterprise adoption — ~90% of high-growth AI startups are actively deploying or experimenting with agents (Iconiq Capital 2025 AI Report)

The Agent Loop: How Agents Think and Act¶

Every agent follows a core execution cycle. The most influential paradigm is ReAct (Reasoning + Acting):

┌─────────────────────────────────────────────────────────────┐
│                  ReAct Agent Loop                           │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  User Input                                                 │
│      │                                                      │
│      ▼                                                      │
│  ┌─────────────────────────────────────────────────────┐   │
│  │   ┌──────────┐  ┌──────────┐  ┌───────────────┐   │   │
│  │   │  Think   │→ │  Act     │→ │  Observe      │   │   │
│  │   │(Reason)  │  │(Tool call│  │(Get result)   │   │   │
│  │   └──────────┘  └──────────┘  └───────────────┘   │   │
│  │         ↑                                       │      │   │
│  │         └───────────────────────────────────────┘      │   │
│  └─────────────────────────────────────────────────────┘   │
│      │                                                      │
│      ▼                                                      │
│  Final Response                                             │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Other loop patterns include:

Pattern	Description	Best For
ReAct	Interleave reasoning + acting	Open-domain QA, external information
Plan-and-Execute	Plan first, then execute step by step	Long-horizon tasks, report generation
Reflection / Self-Refine	Generate → Evaluate → Revise iteratively	High-quality text/code generation
Supervisor + Workers	Manager dispatches to specialists	Complex multi-step pipelines

Core Components¶

1. Tool System¶

Agents gain real-world capabilities through tools. Each tool has: - Name and description — so the LLM knows when to use it - Parameters — JSON schema that the LLM fills in - Handler — The actual code that runs the tool

Common tool categories: shell execution, file I/O, web search, browser automation, code execution, API calls, database queries.

2. Memory System¶

Agents persist information across sessions. According to recent surveys, agent memory has evolved through three stages:

Era	Approach	Limitation
2020–2023	Classic RAG (read-only)	Cannot learn from interactions
2023–2024	Agentic RAG (smart retrieval)	Still read-only; no learning
2024+	True Agent Memory (read + write)	Can create, update, delete; learn from experience

Agent memory can be classified by form (token-level, parametric, latent) and by function (factual, experiential, working). See Agent Architectures for details.

3. Multi-Agent Collaboration¶

When a single agent is insufficient, multiple specialized agents collaborate:

Supervisor / Router — Decides which agent handles a request
Specialist agents — Each excels at one domain (coding, research, writing)
Communication protocol — Agents pass results to each other

Chapter Roadmap¶

This section covers three interconnected topics:

Chapter	File	Content
Agent Architectures	agent_architectures.md	Core patterns: tool use, memory (RAG → Agent Memory), reasoning loops, planning, security, with Hermes Agent as a concrete example
Multi-Agent Systems	multi_agent_systems.md	OpenAI Swarm, AutoGen/MAF, Stanford Generative Agents, CrewAI, LangGraph — with architecture comparisons and code examples
LLM Basics	llm_basics.md	API calling (OpenAI, Anthropic, Gemini, DeepSeek, Qwen), local deployment (llama.cpp, Ollama, vLLM, TGI), coding assistants

Why Agents Matter for Robotics¶

Agents are becoming the key interface layer for robotic systems:

User Command ("Pick up the red cube")
       │
       ▼
┌─────────────────┐
│  Agent (LLM)   │ ← Plans: detect cube → plan grasp → execute
└────────┬────────┘
         │ tool calls
         ▼
┌─────────────────┐
│  Perception     │ ← Camera → YOLO → find red cube
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  Manipulation   │ ← Grasp planning → arm control
└────────┬────────┘
         │
         ▼
    Robot Action

An LLM-based agent serves as the task planner for a robot — translating high-level natural language commands into executable robot actions, reasoning about failures, and adapting plans dynamically.

References¶

ReAct: Synergizing Reasoning and Acting in Language Models — Core agent loop paradigm
Generative Agents: Interactive Simulacra of Human Behavior — Stanford Smallville
AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation — Microsoft Research
Memory in the Age of AI Agents: A Survey — NUS, Renmin University, Fudan (comprehensive memory taxonomy)
OpenAI Swarm — Educational multi-agent framework
CrewAI Documentation — Role-based multi-agent framework
LangGraph Documentation — Graph-based agent workflow
Iconiq Capital: 2025 State of AI — Enterprise AI adoption data
Hermes Agent (Nous Research) — Self-improving single agent
Building Effective Agents (Anthropic) — Production agent patterns