Multi-Robot Formation Control (多机器人编队控制)¶

Project Type: Multi-Robot | Difficulty: ★★☆☆☆ to ★★★★☆ (approach-dependent) | Estimated Time: 2–4 weekends

1. Project Overview¶

Multi-robot formation control is the problem of coordinating a group of autonomous robots to move as a cohesive unit while maintaining a specified geometric shape (line, triangle, diamond, etc.). This project covers three progressively more sophisticated control paradigms:

  ┌──────────────────────────────────────────────────────────────┐
  │                   FORMATION CONTROL                           │
  │                                                              │
  │    ◆ Leader                                                 │
  │    │                                                        │
  │    ├─ ◆ Follower 1   (offset: -Δx, +Δy)                    │
  │    │                                                        │
  │    └─ ◆ Follower 2   (offset: -2Δx, 0)                     │
  │                                                              │
  │    Shape: Arrow / Line / Triangle / Diamond                 │
  │                                                              │
  │    All robots track desired positions relative to leader     │
  └──────────────────────────────────────────────────────────────┘

Tier	Approach	Key Idea	Communication
Traditional	Leader-Follower	Leader trajectory defines the group path; followers maintain relative poses	One-way (leader → followers)
Intermediate	Consensus / Virtual Structure	Graph Laplacian drives distributed agreement on formation geometry	Bidirectional
Modern	MARL (MAPPO / QMIX)	Neural network learns cooperative formation policy via CTDE	Partial / no comms

2. Hardware & Software Requirements¶

Hardware¶

Component	Specification	Notes
Robots	3–5 mobile robots (differential drive)	Any platform (TurtleBot, custom kit)
Computing	Onboard computer per robot (Raspberry Pi 4 / Jetson Nano)	Or use simulation-only
Communication	WiFi router / Ethernet switch	Required for multi-agent networking
LiDAR	2D LiDAR per robot (optional)	RPLIDAR A1 or similar
Camera	Front-facing camera (optional)	For vision-based relative pose estimation

Software¶

Package	Version	Purpose
Python	≥ 3.8	Core language
NumPy / SciPy	≥ 1.20 / ≥ 1.7	Matrix ops, graph theory
ROS 2 (Humble / Foxy)	≥ Foxy	Multi-robot framework with namespace
NetworkX	≥ 2.6	Graph representation
PyTorch	≥ 1.10	Neural networks for MARL
Ray [rllib]	≥ 2.0	Multi-agent RL training
Gymnasium	≥ 0.26	RL environment interface
matplotlib	≥ 3.5	Visualization

pip install numpy scipy networkx torch gymnasium matplotlib
# For multi-agent RL:
pip install "ray[rllib]" pettingzoo

3. Formation Geometry Basics¶

3.1 Rigid Formations¶

A formation of \(n\) robots with positions \(p_1, \dots, p_n \in \mathbb{R}^2\) is rigid if the inter-robot distances uniquely determine the shape (up to rotation/translation). The minimum number of distance constraints for a rigid formation in \(\mathbb{R}^2\) is:

\[ |E_{\min}| = 2n - 3 \]

where \(|E_{\min}|\) is the minimum number of edges in a minimally rigid graph (Laman's theorem).

3.2 Common Formation Shapes¶

  Line Formation              Triangle Formation           Diamond Formation

  ◆ ─ ◆ ─ ◆ ─ ◆              ◆ (leader)                  ◆
                                                          │
  spacing = Δx                  ◆        ◆                ◆ ─ ◆
                                (followers)              │
                                                          ◆

Shape	Robots	Edges (Laman)	Description
Line	n	2n − 3	Equal spacing, same heading
Triangle	3	3	Equilateral, leader at apex
Diamond	4	5	Leader + 3 followers at vertices

4. Approach A — Traditional: Leader-Follower Formation Control¶

4.1 Concept¶

The Leader-Follower approach designates one robot as the leader and defines desired relative positions for all followers. The leader follows a pre-planned or teleoperated trajectory; each follower tracks its target position relative to the leader.

  Leader trajectory
        │
        ▼
  ◆ ───────────────▶ (Leader)
  │
  │  Δx, Δy
  ▼
  ◆ (Follower tracks relative position)

4.2 Relative Pose Control Law¶

Let \((x_L, y_L, \theta_L)\) be the leader's pose and \((x_i, y_i, \theta_i)\) the \(i\)-th follower's pose. The desired position of follower \(i\) is:

\[ \begin{bmatrix} x_i^d \\ y_i^d \end{bmatrix} = \begin{bmatrix} x_L \\ y_L \end{bmatrix} + R(\theta_L) \cdot \begin{bmatrix} \Delta x_i \\ \Delta y_i \end{bmatrix} \]

where \(R(\theta_L)\) is the 2D rotation matrix and \((\Delta x_i, \Delta y_i)\) are the constant offsets in the leader's body frame.

The follower uses a proportional controller to reach \((x_i^d, y_i^d)\):

\[ \begin{aligned} e_x &= x_i^d - x_i \\ e_y &= y_i^d - y_i \\ v_i &= k_v \sqrt{e_x^2 + e_y^2} \\ \omega_i &= k_\omega \cdot \operatorname{atan2}(e_y, e_x) \end{aligned} \]

4.3 Complete Python Code¶

"""
Approach A: Leader-Follower Formation Control
==============================================
Simulates N robots in formation following a leader's trajectory.
Uses proportional position control for each follower.
"""

import numpy as np
import matplotlib.pyplot as plt
from matplotlib.patches import FancyArrowPatch
import matplotlib.animation as animation

# ─── Parameters ────────────────────────────────────────────────────
DT = 0.05          # Time step (s)
T = 30.0           # Total simulation time (s)
K_POS = 1.2        # Position gain (proportional controller)
K_HEAD = 2.0      # Heading gain
ROBOT_RADIUS = 0.2  # For visualization

# ─── Formation Definitions ─────────────────────────────────────────
# Each entry: (Δx_rel, Δy_rel) in leader's body frame
FORMATION_LINE = [
    (-1.0,  0.0),
    (-2.0,  0.0),
    (-3.0,  0.0),
]

FORMATION_TRIANGLE = [
    (-1.0, -1.0),
    (-1.0,  1.0),
]

# Choose formation here:
DESIRED_OFFSETS = FORMATION_LINE


def rot2(theta: float) -> np.ndarray:
    """2D rotation matrix."""
    c, s = np.cos(theta), np.sin(theta)
    return np.array([[c, -s], [s, c]])


def normalize_angle(a: float) -> float:
    return (a + np.pi) % (2 * np.pi) - np.pi


def leader_trajectory(t: float) -> tuple:
    """
    Define the leader's path as a function of time.
    Here: a circular path of radius 3m at speed 0.5 rad/s.
    """
    omega = 0.5  # rad/s
    x = 3.0 + 3.0 * np.cos(omega * t)
    y = 3.0 + 3.0 * np.sin(omega * t)
    theta = omega * t + np.pi / 2  # tangent direction
    return x, y, theta


def desired_position(leader_pose: tuple, offset: tuple) -> np.ndarray:
    """Compute desired world-frame position for a follower."""
    xL, yL, thetaL = leader_pose
    dx, dy = offset
    delta = rot2(thetaL) @ np.array([dx, dy])
    return np.array([xL + delta[0], yL + delta[1]])


def follower_control(
    current: np.ndarray,
    desired: np.ndarray,
    current_heading: float,
    k_pos: float = K_POS,
    k_head: float = K_HEAD,
) -> tuple:
    """
    Proportional controller for differential-drive follower.

    Returns: (v, omega) velocity commands
    """
    ex = desired[0] - current[0]
    ey = desired[1] - current[1]
    dist = np.sqrt(ex**2 + ey**2)

    # Desired heading toward target
    theta_d = np.arctan2(ey, ex)
    theta_e = normalize_angle(theta_d - current_heading)

    # Linear and angular velocity commands
    v = k_pos * dist
    omega = k_head * theta_e

    # Clamp velocities
    v = max(0.0, min(2.0, v))
    omega = max(-3.0, min(3.0, omega))
    return v, omega


def simulate(n_steps: int, formation_offsets: list) -> dict:
    """Run the formation simulation."""
    n_agents = 1 + len(formation_offsets)  # 1 leader + followers
    n_followers = len(formation_offsets)

    # State: [x, y, theta] per agent
    state = np.zeros((n_agents, 3))
    state[0] = [3.0 + 3.0, 3.0, 0.0]  # leader initial

    trajectory = [state.copy()]

    for step in range(n_steps):
        t = step * DT

        # ── Leader moves along trajectory ──
        xL, yL, thetaL = leader_trajectory(t)
        state[0, :2] = [xL, yL]
        state[0, 2] = thetaL

        leader_pose = (xL, yL, thetaL)

        # ── Each follower tracks its desired position ──
        for i, offset in enumerate(formation_offsets):
            agent_idx = i + 1
            desired_pos = desired_position(leader_pose, offset)
            v, omega = follower_control(
                current=state[agent_idx, :2],
                desired=desired_pos,
                current_heading=state[agent_idx, 2],
            )

            # Differential drive update
            state[agent_idx, 0] += v * np.cos(state[agent_idx, 2]) * DT
            state[agent_idx, 1] += v * np.sin(state[agent_idx, 2]) * DT
            state[agent_idx, 2] += omega * DT
            state[agent_idx, 2] = normalize_angle(state[agent_idx, 2])

        trajectory.append(state.copy())

    return {"trajectory": trajectory, "offsets": formation_offsets}


def plot_formation(trajectory: np.ndarray, offsets: list, save_path: str = None):
    """Plot the formation at the final time step."""
    n_agents = trajectory.shape[1]
    n_followers = len(offsets)

    fig, ax = plt.subplots(figsize=(8, 8))

    # Plot leader path
    lx = trajectory[:, 0, 0]
    ly = trajectory[:, 0, 1]
    ax.plot(lx, ly, 'b--', alpha=0.4, label="Leader path")

    # Plot agents at final time
    final = trajectory[-1]

    # Leader
    ax.plot(final[0, 0], final[0, 1], 'b^', markersize=15, label="Leader")
    ax.annotate("Leader",
                (final[0, 0], final[0, 1]),
                textcoords="offset points", xytext=(8, 8), fontsize=9)

    # Followers
    colors = ['red', 'green', 'orange', 'purple']
    for i in range(n_followers):
        idx = i + 1
        ax.plot(final[idx, 0], final[idx, 1], 'o',
                color=colors[i % len(colors)], markersize=12)
        ax.annotate(f"F{i+1}",
                    (final[idx, 0], final[idx, 1]),
                    textcoords="offset points", xytext=(6, 6), fontsize=9)

        # Draw formation line to leader
        ax.plot([final[0, 0], final[idx, 0]],
                [final[0, 1], final[idx, 1]],
                '--', color='gray', alpha=0.5)

    ax.set_xlabel("X (m)")
    ax.set_ylabel("Y (m)")
    ax.set_title(f"Leader-Follower Formation — {len(offsets)}+1 robots")
    ax.legend()
    ax.set_aspect('equal')
    ax.grid(True, alpha=0.3)
    plt.tight_layout()
    if save_path:
        plt.savefig(save_path, dpi=150)
    plt.show()


def main():
    n_steps = int(T / DT)
    result = simulate(n_steps, DESIRED_OFFSETS)
    trajectory = np.array(result["trajectory"])
    print(f"Simulation complete: {len(trajectory)} steps, "
          f"{trajectory.shape[1]} agents")
    print(f"Final positions:")
    for i, s in enumerate(trajectory[-1]):
        label = "Leader" if i == 0 else f"Follower {i}"
        print(f"  {label}: x={s[0]:.2f}, y={s[1]:.2f}, θ={np.degrees(s[2]):.1f}°")
    plot_formation(trajectory, DESIRED_OFFSETS)


if __name__ == "__main__":
    main()

4.4 ROS 2 Multi-Robot Setup¶

<!-- launch/multi_robot_formation.launch -->
<launch>
  <!-- Launch N robot instances with namespaced topics -->
  <arg name="num_robots" default="4"/>

  <!-- Robot 1 (Leader) -->
  <group ns="robot1">
    <include file="$(find my_robot)/launch/robot.launch"/>
    <!-- Leader-specific: publishes formation leader pose -->
    <node pkg="formation_control" type="leader_node" name="leader"/>
  </group>

  <!-- Robots 2..N (Followers) -->
  <include file="$(find my_robot)/launch/follower.launch">
    <arg name="robot_name" value="robot2"/>
    <arg name="offset_x" value="-1.0"/>
    <arg name="offset_y" value="0.0"/>
  </include>

  <include file="$(find my_robot)/launch/follower.launch">
    <arg name="robot_name" value="robot3"/>
    <arg name="offset_x" value="-2.0"/>
    <arg name="offset_y" value="0.0"/>
  </include>
</launch>

Topic remapping for isolation:

Topic	Remapped Topic	Notes
`/cmd_vel`	`/robot1/cmd_vel`	Namespaced per robot
`/odom`	`/robot1/odom`
`/scan`	`/robot1/scan`

5. Approach B — Intermediate: Consensus-Based Formation¶

5.1 Concept¶

Consensus-based formation uses graph theory to distribute the formation geometry across all robots. No single leader exists — instead, each robot exchanges state information with its neighbors and collectively agrees on the formation shape through local interactions.

  Graph Topology                    Consensus Dynamics

       ◆ ─ ◆                         Each robot i:
       │ ╲ │                         u_i = Σ a_ij (x_j - x_i)
       ◆ ─ ◆                         ────────────────────
                                      algebraic connectivity

5.2 Graph Laplacian¶

Represent the communication graph with an adjacency matrix \(A = [a_{ij}]\) and degree matrix \(D = \operatorname{diag}(d_i)\) where \(d_i = \sum_j a_{ij}\). The graph Laplacian is:

\[ L = D - A \]

For a formation with desired offsets \(\delta_i\), the consensus protocol is:

\[ \dot{x}_i = \sum_{j \in \mathcal{N}_i} a_{ij} \left[ (x_j - \delta_j) - (x_i - \delta_i) \right] \]

In matrix form:

\[ \dot{\mathbf{x}} = -L \otimes I_n \; (\mathbf{x} - \boldsymbol{\delta}) \]

The formation converges if the graph is connected (algebraic connectivity \(\lambda_2(L) > 0\)).

5.3 Virtual Structure Approach¶

Treat the entire formation as a rigid virtual structure. All robots track target points on this structure as it moves:

\[ p_i^d(t) = p_{\text{center}}(t) + R(t) \cdot d_i \]

where \(p_{\text{center}}(t)\) and \(R(t)\) are the structure's center position and rotation matrix.

5.4 Complete Python Code — Consensus Formation¶

"""
Approach B: Consensus-Based Formation Control
==============================================
Distributed formation control using graph Laplacian and consensus protocol.
No leader — all robots exchange information with neighbors.
"""

import numpy as np
import matplotlib.pyplot as plt
import networkx as nx

# ─── Parameters ────────────────────────────────────────────────────
DT = 0.02           # Time step (s)
T = 20.0            # Total time (s)
N_ROBOTS = 5        # Number of robots
K_CONSENSUS = 1.5   # Consensus gain

# ─── Formation Offsets (in world frame, static) ────────────────────
# Define desired relative positions (relative to formation center)
OFFSETS = np.array([
    [0.0,  0.0],   # Robot 0 (formation center reference)
    [1.0,  0.0],   # Robot 1
    [-1.0, 0.0],   # Robot 2
    [0.0,  1.0],   # Robot 3
    [0.0, -1.0],   # Robot 4
], dtype=float)

# ─── Communication Graph ───────────────────────────────────────────
# Define who talks to whom
ADJACENCY = np.array([
    [0, 1, 1, 0, 0],
    [1, 0, 1, 1, 0],
    [1, 1, 0, 0, 1],
    [0, 1, 0, 0, 1],
    [0, 0, 1, 1, 0],
], dtype=float)

# Build Laplacian: L = D - A
DEGREE = np.diag(ADJACENCY.sum(axis=1))
LAPLACIAN = DEGREE - ADJACENCY

print("Graph Laplacian:")
print(LAPLACIAN)

# Algebraic connectivity (smallest non-zero eigenvalue)
G_nx = nx.from_numpy_array(ADJACENCY)
eigenvalues = sorted(nx.laplacian_spectrum(G_nx))
algebraic_conn = eigenvalues[1]  # First non-zero eigenvalue
print(f"Algebraic connectivity λ₂ = {algebraic_conn:.3f}")


def normalize_angle(a: float) -> float:
    return (a + np.pi) % (2 * np.pi) - np.pi


def consensus_control(positions: np.ndarray, offsets: np.ndarray,
                      laplacian: np.ndarray) -> np.ndarray:
    """
    Consensus protocol: each robot moves toward agreement on formation.

    u_i = k * Σ a_ij * [(x_j - δ_j) - (x_i - δ_i)]
    """
    n = positions.shape[0]
    desired = positions - offsets          # Desired if in formation
    # Distributed disagreement: L @ (pos - offsets)
    disagreement = laplacian @ (positions - offsets)
    return K_CONSENSUS * disagreement


def moving_target(positions: np.ndarray, t: float, omega: float = 0.5) -> np.ndarray:
    """
    Apply a common motion to all desired positions (formation moves as a whole).
    Here: circular motion of the formation center.
    """
    center = np.array([3.0 * np.cos(omega * t), 3.0 * np.sin(omega * t)])
    return OFFSETS + center  # Desired positions in world frame


def simulate_consensus(n_steps: int, initial_pos: np.ndarray) -> dict:
    """Simulate consensus-based formation."""
    positions = initial_pos.copy()
    positions_history = [positions.copy()]

    for step in range(n_steps):
        t = step * DT
        desired_world = moving_target(positions, t)
        offsets_dynamic = desired_world - OFFSETS + OFFSETS  # recompute desired
        desired_world = OFFSETS + np.array([
            3.0 * np.cos(0.5 * t), 3.0 * np.sin(0.5 * t)
        ])

        # Consensus input
        u = consensus_control(positions, OFFSETS, LAPLACIAN)
        positions += u * DT

        positions_history.append(positions.copy())

    return {"positions": np.array(positions_history), "offsets": OFFSETS}


def plot_consensus(positions_history: np.ndarray, offsets: np.ndarray,
                   save_path: str = None):
    """Visualize the consensus formation over time."""
    fig, ax = plt.subplots(figsize=(8, 8))

    n_robots = positions_history.shape[1]
    colors = plt.cm.tab10(np.linspace(0, 1, n_robots))

    # Plot trajectories
    for i in range(n_robots):
        traj = positions_history[:, i, :]
        ax.plot(traj[:, 0], traj[:, 1], '-', color=colors[i], alpha=0.5, linewidth=1)
        ax.plot(traj[-1, 0], traj[-1, 1], 'o', color=colors[i],
                markersize=12, label=f"Robot {i}")
        ax.annotate(f"R{i}", (traj[-1, 0], traj[-1, 1]),
                     textcoords="offset points", xytext=(5, 5), fontsize=9)

    # Plot desired formation centers (at final time)
    center = positions_history[-1, 0]  # Robot 0 as reference
    for i, offset in enumerate(offsets):
        dx = offset[0] + (positions_history[-1, 0, 0] - offsets[0, 0])
        dy = offset[1] + (positions_history[-1, 0, 1] - offsets[0, 1])
        ax.plot(dx, dy, 's', color='gray', markersize=6, alpha=0.4)

    ax.set_xlabel("X (m)")
    ax.set_ylabel("Y (m)")
    ax.set_title(f"Consensus Formation — {n_robots} robots, λ₂={algebraic_conn:.2f}")
    ax.legend()
    ax.set_aspect('equal')
    ax.grid(True, alpha=0.3)
    plt.tight_layout()
    if save_path:
        plt.savefig(save_path, dpi=150)
    plt.show()


def main():
    n_steps = int(T / DT)

    # Random initial positions
    np.random.seed(42)
    initial_pos = np.random.uniform(0, 6, size=(N_ROBOTS, 2)).astype(float)

    result = simulate_consensus(n_steps, initial_pos)
    positions_history = result["positions"]

    print(f"Consensus simulation: {positions_history.shape[0]} steps, "
          f"{positions_history.shape[1]} robots")
    print(f"Algebraic connectivity λ₂ = {algebraic_conn:.3f} (graph is connected: {algebraic_conn > 0})")

    # Final formation errors
    final_pos = positions_history[-1]
    center = final_pos[0]
    for i in range(N_ROBOTS):
        error = np.linalg.norm(final_pos[i] - (center + OFFSETS[i]))
        print(f"  Robot {i}: pos={final_pos[i]}, formation error={error:.3f}m")

    plot_consensus(positions_history, OFFSETS)


if __name__ == "__main__":
    main()

6. Approach C — Modern: Multi-Agent Reinforcement Learning¶

6.1 Concept¶

MARL for formation control trains a network policy where multiple robots learn to cooperate without explicit communication. We use Centralized Training + Decentralized Execution (CTDE):

  Centralized Training (with global info)
  ┌─────────────────────────────────────┐
  │  All robots' observations +         │
  │  critic has access to full state    │
  └────────────────┬────────────────────┘
                   │ train
  ┌────────────────▼────────────────────┐
  │  Decentralized Execution (at test) │
  │  Each actor only sees local obs    │
  │  No inter-robot communication      │
  └─────────────────────────────────────┘

6.2 QMIX — Value Decomposition¶

QMIX decomposes the joint action-value function \(Q_{\text{tot}}\) into per-agent functions \(Q_a\) with a mixing network that enforces individual Global Optima (IGOs):

\[ Q_{\text{tot}}(\boldsymbol{\tau}, \boldsymbol{u}) = f_{\text{mix}}\left(q_1, q_2, \ldots, q_n; \boldsymbol{\phi}\right) \]

The mixing network uses hypernetworks with monotonicity constraints:

\[ \frac{\partial Q_{\text{tot}}}{\partial q_a} \geq 0 \quad \forall a \]

This guarantees that maximizing \(Q_{\text{tot}}\) is equivalent to each agent independently maximizing its own \(Q_a\).

6.3 MAPPO — Multi-Agent PPO¶

MAPPO applies PPO to the CTDE framework. Each agent \(i\) has a policy \(\pi_{\theta_i}\) trained to maximize:

\[ J(\theta_i) = \mathbb{E}\left[ \min\left( r_t, 1+\epsilon \right) \cdot A^{\pi_{\text{old}}} \right] \]

The centralized critic \(V_\phi(s)\) observes the full state \(s\); the decentralized actor \(\pi_{\theta_i}(o_i)\) only sees local observation \(o_i\).

6.4 PettingZoo / rllib Code — MARL Formation¶

"""
Approach C: Multi-Agent RL Formation Control
=============================================
Uses rllib (Ray) with a custom multi-agent environment for formation control.
Trains 5 robots to maintain relative distances using QMIX-style value decomposition.
"""

import numpy as np
import gymnasium as gym
from gymnasium import spaces
from ray.rllib.algorithms.maddpg import MADDPGAgent
from ray.tune import register_env
from ray import tune


class FormationEnv(gym.Env):
    """
    Multi-agent formation environment.

    Each agent observes: [self_pos_x, self_pos_y, rel_pos_to_others, target_offset_x, target_offset_y]
    Actions: [v_x_cmd, v_y_cmd] (velocity commands)
    Reward: negative formation error + movement penalty
    """

    metadata = {"render_modes": ["human"]}

    def __init__(self, config=None):
        super().__init__()
        self.n_agents = config.get("n_agents", 5)
        self.dt = 0.1
        self.max_steps = 500

        # Observation: self pos (2) + relative to others (2*(n-1)) + target offset (2)
        obs_dim = 2 + 2 * (self.n_agents - 1) + 2
        self.observation_space = spaces.Box(
            low=-10.0, high=10.0, shape=(obs_dim,), dtype=np.float32
        )
        # Action: 2D velocity command
        self.action_space = spaces.Box(
            low=-1.0, high=1.0, shape=(2,), dtype=np.float32
        )

        self.positions = None
        self.target_offsets = None
        self.step_count = 0

    def _compute_offsets(self) -> np.ndarray:
        """Desired formation offsets (line formation)."""
        offsets = np.zeros((self.n_agents, 2))
        for i in range(1, self.n_agents):
            offsets[i] = [-1.0 * i, 0.0]
        return offsets

    def reset(self, seed=None, options=None):
        super().reset(seed=seed)
        self.step_count = 0

        # Randomize starting positions
        rng = np.random.default_rng(seed)
        self.positions = rng.uniform(0, 5, size=(self.n_agents, 2)).astype(np.float32)

        # Formation offsets (leader at front, others behind)
        self.target_offsets = self._compute_offsets()

        return self._get_obs(), self._get_obs_as_dict()

    def _get_obs(self) -> np.ndarray:
        """Return flat observation array (used by MADDPG)."""
        obs_list = []
        for i in range(self.n_agents):
            obs_i = [self.positions[i, 0], self.positions[i, 1]]
            # Relative positions to other agents
            for j in range(self.n_agents):
                if j != i:
                    obs_i.extend([
                        self.positions[j, 0] - self.positions[i, 0],
                        self.positions[j, 1] - self.positions[i, 1],
                    ])
            # Target offset from formation center
            center = np.mean(self.positions, axis=0)
            obs_i.extend([
                center[0] + self.target_offsets[i, 0] - self.positions[i, 0],
                center[1] + self.target_offsets[i, 1] - self.positions[i, 1],
            ])
            obs_list.append(np.array(obs_i, dtype=np.float32))
        return np.array(obs_list)

    def _get_obs_as_dict(self) -> dict:
        """Return dict observation (for rllib multi-agent API)."""
        obs = self._get_obs()
        return {f"agent_{i}": obs[i] for i in range(self.n_agents)}

    def step(self, actions):
        self.step_count += 1

        # Convert dict or array actions
        if isinstance(actions, dict):
            actions_list = [actions[f"agent_{i}"] for i in range(self.n_agents)]
        else:
            actions_list = actions

        actions_np = np.array(actions_list, dtype=np.float32)

        # Apply velocity commands
        for i in range(self.n_agents):
            self.positions[i] += actions_np[i] * 0.5 * self.dt
            # Clamp positions
            self.positions[i] = np.clip(self.positions[i], -10, 10)

        # ── Reward: formation maintenance ──
        center = np.mean(self.positions, axis=0)
        rewards = []
        for i in range(self.n_agents):
            desired = center + self.target_offsets[i]
            error = np.linalg.norm(self.positions[i] - desired)
            # Reward: high when error is low, slight penalty for large movements
            r = -error + 0.1 * (1.0 - np.linalg.norm(actions_np[i]))
            rewards.append(float(r))

        rewards_dict = {f"agent_{i}": rewards[i] for i in range(self.n_agents)}

        # Termination
        terminated = {f"agent_{i}": False for i in range(self.n_agents)}
        terminated["__all__"] = self.step_count >= self.max_steps
        truncated = {f"agent_{i}": False for i in range(self.n_agents)}
        truncated["__all__"] = False

        obs = self._get_obs_as_dict()
        info = {}

        return obs, rewards_dict, terminated, truncated, info


# ─── Registration & Training ────────────────────────────────────────
register_env("formation_env", lambda cfg: FormationEnv(cfg))


def train_maddpg(n_timesteps: int = 200_000):
    """Train MADDPG agents on the formation task."""
    config = {
        "env": "formation_env",
        "env_config": {"n_agents": 5},
        "num_workers": 2,
        "gamma": 0.99,
        "actor_lr": 1e-4,
        "critic_lr": 1e-3,
        "train_batch_size": 1024,
        "model": {
            "fcnet_hiddens": [128, 128],
        },
        "framework": "torch",
    }

    print(f"[INFO] Training MADDPG for {n_timesteps} timesteps...")
    results = tune.run(
        "MADDPG",
        config=config,
        stop={"timesteps_total": n_timesteps},
        checkpoint_at_end=True,
    )

    best = results.get_best_result(metric="episode_reward_mean", mode="max")
    print(f"[INFO] Best checkpoint: {best.checkpoint.path}")
    return best


def evaluate_agent(checkpoint_path: str, n_episodes: int = 5):
    """Evaluate a trained formation policy."""
    from ray.rllib.algorithms.maddpg import MADDPGAgent

    env = FormationEnv({"n_agents": 5})
    agent = MADDPGAgent(config={"env_config": {"n_agents": 5}, "framework": "torch"})
    agent.restore(checkpoint_path)

    for ep in range(n_episodes):
        obs, _ = env.reset()
        done = False
        ep_reward = 0.0

        while not done:
            actions = {}
            for i in range(env.n_agents):
                action, _ = agent.compute_action(obs[f"agent_{i}"], policy_id=f"agent_{i}")
                actions[f"agent_{i}"] = action
            obs, rewards, terminated, truncated, info = env.step(actions)
            ep_reward += sum(rewards.values())
            done = terminated["__all__"] or truncated["__all__"]

        print(f"  Episode {ep+1}: reward={ep_reward:.1f}")


if __name__ == "__main__":
    import sys
    if len(sys.argv) > 1 and sys.argv[1] == "eval":
        evaluate_agent(sys.argv[2] if len(sys.argv) > 2 else None)
    else:
        train_maddpg()

7. Comparison of Formation Approaches¶

Criteria	Leader-Follower	Consensus / Virtual Structure	MARL (MADDPG/QMIX)
Scalability	★★★☆☆ (single point of failure)	★★★★☆	★★★★★
Fault Tolerance	★★☆☆☆	★★★★☆	★★★★☆
Formation Rigidity	✅ Explicit (rigid offsets)	✅ Graph-theoretic rigidity	✅ Emerges from reward
Communication	One-way (leader → followers)	Bidirectional (all-to-all)	None / partial (CTDE)
Convergence Speed	Fast (leader-driven)	Moderate (distributed)	Slow (needs training)
Handles Obstacles	❌ Requires replanning	❌ Needs collision avoidance	✅ Learns reactive behaviors
Setup Complexity	Low	Medium	High
Computation Cost	Very low	Low	High (training + inference)
Adaptability	Fixed formations only	Reconfigurable (change graph)	Learns new formations
Best for	Simple, small groups	Structured environments	Complex, large groups

8. Step-by-Step Implementation Guide¶

Phase 1 — Single-Robot Teleoperation (All Approaches)¶

Set up each robot with ROS 2 and verify cmd_vel / odom topics work.
Test teleoperation (teleop_twist_keyboard) for each robot individually.
Set up the WiFi network and verify inter-robot ping connectivity.

Phase 2 — ROS 2 Namespace Setup¶

# Launch robots with separate namespaces
ros2 launch multi_robot_bringup robot_multi.launch.py num_robots:=4

# Verify topics are namespaced:
ros2 topic list | grep robot
# Expected: /robot1/cmd_vel, /robot2/cmd_vel, ...

Phase 3 — Leader-Follower (Approach A)¶

Implement the leader trajectory publisher.
Add the follower controller node that subscribes to /robot1/pose.
Configure offsets per robot in the parameter file.
Run in simulation first, then on hardware.

Phase 4 — Consensus (Approach B)¶

Define the communication topology (mesh, chain, star).
Implement the consensus protocol using networkx Laplacian.
Add inter-robot communication via ROS 2 topics (/robot{i}/state).
Test with a static formation, then add a moving target.

Phase 5 — MARL (Approach C)¶

Install Ray: pip install ray[rllib]
Run the formation environment in simulation: python formation_rl.py
Train for ~200k timesteps (or use pre-trained checkpoints).
Deploy trained policies to real robots via ROS 2.

9. Extensions and Variations¶

9.1 Formation Reconfiguration¶

Switch between formation shapes dynamically by changing offsets:

SHAPES = {
    "line":      [(-i, 0) for i in range(n)],
    "triangle":  [(0,0), (-1,-1), (-1,1)],
    "diamond":   [(0,0), (0,-1), (1,0), (0,1)],
}
# Trigger shape change via ROS parameter or service call

9.2 Obstacle Avoidance in Formation¶

Add a potential field layer to the consensus protocol:

\[ \dot{x}_i = \underbrace{-L(x_i - \delta_i)}_{\text{formation}} + \underbrace{\sum_{o \in \text{obstacles}} \frac{x_i - o}{\|x_i - o\|^3}}_{\text{obstacle repulsion}} \]

9.3 Dynamic Formation Sizes¶

Use graph-theoretic rigidity to verify that a formation of \(n\) robots with \(m\) edges is minimally rigid:

\[ \text{Laman condition: } |E| = 2n - 3 \quad \text{and} \quad \forall \text{ subgraph } H: |E_H| \leq 2|V_H| - 3 \]

10. References¶

Ren, W., & Beard, R.W. (2005). Consensus Seeking in Multiagent Systems Under Dynamically Changing Interaction Topologies. IEEE Transactions on Automatic Control, 50(5), 655–661.
Jadbabaie, A., Lin, J., & Morse, A.S. (2003). Coordination of Groups of Mobile Autonomous Agents Using Nearest Neighbor Rules. IEEE Transactions on Automatic Control, 48(6), 988–1001.
Olfati-Saber, R., & Murray, R.M. (2004). Consensus Problems in Networks of Agents With Switching Topology and Time-Delays. IEEE Transactions on Automatic Control, 49(9), 1520–1533.
Desai, J.P., Ostrowski, J.P., & Kumar, V. (2001). Modeling and Control of Formations of Nonholonomic Mobile Robots. IEEE Transactions on Robotics and Automation, 17(6), 905–908.
Low, C.B. (2019). A Flexible Hybrid Formation Control for Mobile Robots Using Reinforcement Learning. IEEE Robotics and Automation Letters.
Lowe, R., Wu, Y., et al. (2017). Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments. NeurIPS.
Rashid, T., Samvelov, M., et al. (2018). QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning. ICML.
Yu, C., et al. (2022). The Surprising Effectiveness of PPO in Cooperative Multi-Agent Reinforcement Learning. ICML.
Peterson, F., et al. (2023). Networked Multi-Robot Systems. ACM Computing Surveys.
Ray/rllib Documentation — Multi-agent RL framework
NetworkX Laplacian Documentation — Graph theory tools
ROS 2 Multi-Robot Documentation — Namespace and remapping