Skip to content

Agents Based on Large Language Models

Large language models (LLMs) have evolved from "text generators" into the "brain" of autonomous agents. These LLM-based agents can perceive their environment, plan multi-step actions, use tools, learn from experience, and collaborate with other agents to accomplish complex tasks.

What Is an LLM-Based Agent?

An LLM-based agent is a system where an LLM serves as the core controller — planning, reasoning, and deciding what to do next. The agent is typically augmented with:

  • Tools — API calls, code execution, web search, file I/O, browser automation, etc.
  • Memory — Persistent storage of user preferences, learned facts, and session history
  • Agents / Roles — Multiple specialized agents collaborating (multi-agent systems)
  • Environment interaction — Perceiving and acting upon the world through available interfaces
┌─────────────────────────────────────────────────────────────┐
│                     LLM-Based Agent                          │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│   ┌──────────────────────────────────────────────────────┐  │
│   │                   Core LLM                             │  │
│   │          (Planning, Reasoning, Decision)              │  │
│   └──────────────────────────────────────────────────────┘  │
│                          │                                   │
│          ┌───────────────┼───────────────┐                  │
│          ▼               ▼               ▼                   │
│   ┌────────────┐  ┌────────────┐  ┌────────────┐          │
│   │   Tools    │  │   Memory   │  │  Agents    │          │
│   │ web/code/  │  │ persistent │  │ multi-agent│          │
│   │  file I/O  │  │  + session │  │  routing   │          │
│   └────────────┘  └────────────┘  └────────────┘          │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Why 2024–2025 Is the "Agent Era"

The convergence of several factors has made LLM agents mainstream:

  • Stronger LLMs — GPT-4o, Claude 3.5, Gemini, DeepSeek-V3, and o1/o3 reasoning models provide better reasoning and planning
  • Tool use — OpenAI's function calling, Anthropic's tool use, and native API support made tool integration standard
  • Memory systems — RAG, vector databases, and agent-specific memory architectures (Forms–Functions–Dynamics) matured
  • Multi-agent orchestration — Frameworks like AutoGen, CrewAI, LangGraph, and Swarm democratized multi-agent development
  • Enterprise adoption — ~90% of high-growth AI startups are actively deploying or experimenting with agents (Iconiq Capital 2025 AI Report)

The Agent Loop: How Agents Think and Act

Every agent follows a core execution cycle. The most influential paradigm is ReAct (Reasoning + Acting):

┌─────────────────────────────────────────────────────────────┐
│                  ReAct Agent Loop                           │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  User Input                                                 │
│      │                                                      │
│      ▼                                                      │
│  ┌─────────────────────────────────────────────────────┐   │
│  │   ┌──────────┐  ┌──────────┐  ┌───────────────┐   │   │
│  │   │  Think   │→ │  Act     │→ │  Observe      │   │   │
│  │   │(Reason)  │  │(Tool call│  │(Get result)   │   │   │
│  │   └──────────┘  └──────────┘  └───────────────┘   │   │
│  │         ↑                                       │      │   │
│  │         └───────────────────────────────────────┘      │   │
│  └─────────────────────────────────────────────────────┘   │
│      │                                                      │
│      ▼                                                      │
│  Final Response                                             │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Other loop patterns include:

Pattern Description Best For
ReAct Interleave reasoning + acting Open-domain QA, external information
Plan-and-Execute Plan first, then execute step by step Long-horizon tasks, report generation
Reflection / Self-Refine Generate → Evaluate → Revise iteratively High-quality text/code generation
Supervisor + Workers Manager dispatches to specialists Complex multi-step pipelines

Core Components

1. Tool System

Agents gain real-world capabilities through tools. Each tool has: - Name and description — so the LLM knows when to use it - Parameters — JSON schema that the LLM fills in - Handler — The actual code that runs the tool

Common tool categories: shell execution, file I/O, web search, browser automation, code execution, API calls, database queries.

2. Memory System

Agents persist information across sessions. According to recent surveys, agent memory has evolved through three stages:

Era Approach Limitation
2020–2023 Classic RAG (read-only) Cannot learn from interactions
2023–2024 Agentic RAG (smart retrieval) Still read-only; no learning
2024+ True Agent Memory (read + write) Can create, update, delete; learn from experience

Agent memory can be classified by form (token-level, parametric, latent) and by function (factual, experiential, working). See Agent Architectures for details.

3. Multi-Agent Collaboration

When a single agent is insufficient, multiple specialized agents collaborate:

  • Supervisor / Router — Decides which agent handles a request
  • Specialist agents — Each excels at one domain (coding, research, writing)
  • Communication protocol — Agents pass results to each other

Chapter Roadmap

This section covers three interconnected topics:

Chapter File Content
Agent Architectures agent_architectures.md Core patterns: tool use, memory (RAG → Agent Memory), reasoning loops, planning, security, with Hermes Agent as a concrete example
Multi-Agent Systems multi_agent_systems.md OpenAI Swarm, AutoGen/MAF, Stanford Generative Agents, CrewAI, LangGraph — with architecture comparisons and code examples
LLM Basics llm_basics.md API calling (OpenAI, Anthropic, Gemini, DeepSeek, Qwen), local deployment (llama.cpp, Ollama, vLLM, TGI), coding assistants

Why Agents Matter for Robotics

Agents are becoming the key interface layer for robotic systems:

User Command ("Pick up the red cube")
┌─────────────────┐
│  Agent (LLM)   │ ← Plans: detect cube → plan grasp → execute
└────────┬────────┘
         │ tool calls
┌─────────────────┐
│  Perception     │ ← Camera → YOLO → find red cube
└────────┬────────┘
┌─────────────────┐
│  Manipulation   │ ← Grasp planning → arm control
└────────┬────────┘
    Robot Action

An LLM-based agent serves as the task planner for a robot — translating high-level natural language commands into executable robot actions, reasoning about failures, and adapting plans dynamically.

References