Multi-Robot Formation Control (多机器人编队控制)¶
Project Type: Multi-Robot | Difficulty: ★★☆☆☆ to ★★★★☆ (approach-dependent) | Estimated Time: 2–4 weekends
1. Project Overview¶
Multi-robot formation control is the problem of coordinating a group of autonomous robots to move as a cohesive unit while maintaining a specified geometric shape (line, triangle, diamond, etc.). This project covers three progressively more sophisticated control paradigms:
┌──────────────────────────────────────────────────────────────┐
│ FORMATION CONTROL │
│ │
│ ◆ Leader │
│ │ │
│ ├─ ◆ Follower 1 (offset: -Δx, +Δy) │
│ │ │
│ └─ ◆ Follower 2 (offset: -2Δx, 0) │
│ │
│ Shape: Arrow / Line / Triangle / Diamond │
│ │
│ All robots track desired positions relative to leader │
└──────────────────────────────────────────────────────────────┘
| Tier | Approach | Key Idea | Communication |
|---|---|---|---|
| Traditional | Leader-Follower | Leader trajectory defines the group path; followers maintain relative poses | One-way (leader → followers) |
| Intermediate | Consensus / Virtual Structure | Graph Laplacian drives distributed agreement on formation geometry | Bidirectional |
| Modern | MARL (MAPPO / QMIX) | Neural network learns cooperative formation policy via CTDE | Partial / no comms |
2. Hardware & Software Requirements¶
Hardware¶
| Component | Specification | Notes |
|---|---|---|
| Robots | 3–5 mobile robots (differential drive) | Any platform (TurtleBot, custom kit) |
| Computing | Onboard computer per robot (Raspberry Pi 4 / Jetson Nano) | Or use simulation-only |
| Communication | WiFi router / Ethernet switch | Required for multi-agent networking |
| LiDAR | 2D LiDAR per robot (optional) | RPLIDAR A1 or similar |
| Camera | Front-facing camera (optional) | For vision-based relative pose estimation |
Software¶
| Package | Version | Purpose |
|---|---|---|
| Python | ≥ 3.8 | Core language |
| NumPy / SciPy | ≥ 1.20 / ≥ 1.7 | Matrix ops, graph theory |
| ROS 2 (Humble / Foxy) | ≥ Foxy | Multi-robot framework with namespace |
| NetworkX | ≥ 2.6 | Graph representation |
| PyTorch | ≥ 1.10 | Neural networks for MARL |
| Ray [rllib] | ≥ 2.0 | Multi-agent RL training |
| Gymnasium | ≥ 0.26 | RL environment interface |
| matplotlib | ≥ 3.5 | Visualization |
pip install numpy scipy networkx torch gymnasium matplotlib
# For multi-agent RL:
pip install "ray[rllib]" pettingzoo
3. Formation Geometry Basics¶
3.1 Rigid Formations¶
A formation of \(n\) robots with positions \(p_1, \dots, p_n \in \mathbb{R}^2\) is rigid if the inter-robot distances uniquely determine the shape (up to rotation/translation). The minimum number of distance constraints for a rigid formation in \(\mathbb{R}^2\) is:
where \(|E_{\min}|\) is the minimum number of edges in a minimally rigid graph (Laman's theorem).
3.2 Common Formation Shapes¶
Line Formation Triangle Formation Diamond Formation
◆ ─ ◆ ─ ◆ ─ ◆ ◆ (leader) ◆
│
spacing = Δx ◆ ◆ ◆ ─ ◆
(followers) │
◆
| Shape | Robots | Edges (Laman) | Description |
|---|---|---|---|
| Line | n | 2n − 3 | Equal spacing, same heading |
| Triangle | 3 | 3 | Equilateral, leader at apex |
| Diamond | 4 | 5 | Leader + 3 followers at vertices |
4. Approach A — Traditional: Leader-Follower Formation Control¶
4.1 Concept¶
The Leader-Follower approach designates one robot as the leader and defines desired relative positions for all followers. The leader follows a pre-planned or teleoperated trajectory; each follower tracks its target position relative to the leader.
Leader trajectory
│
▼
◆ ───────────────▶ (Leader)
│
│ Δx, Δy
▼
◆ (Follower tracks relative position)
4.2 Relative Pose Control Law¶
Let \((x_L, y_L, \theta_L)\) be the leader's pose and \((x_i, y_i, \theta_i)\) the \(i\)-th follower's pose. The desired position of follower \(i\) is:
where \(R(\theta_L)\) is the 2D rotation matrix and \((\Delta x_i, \Delta y_i)\) are the constant offsets in the leader's body frame.
The follower uses a proportional controller to reach \((x_i^d, y_i^d)\):
4.3 Complete Python Code¶
"""
Approach A: Leader-Follower Formation Control
==============================================
Simulates N robots in formation following a leader's trajectory.
Uses proportional position control for each follower.
"""
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.patches import FancyArrowPatch
import matplotlib.animation as animation
# ─── Parameters ────────────────────────────────────────────────────
DT = 0.05 # Time step (s)
T = 30.0 # Total simulation time (s)
K_POS = 1.2 # Position gain (proportional controller)
K_HEAD = 2.0 # Heading gain
ROBOT_RADIUS = 0.2 # For visualization
# ─── Formation Definitions ─────────────────────────────────────────
# Each entry: (Δx_rel, Δy_rel) in leader's body frame
FORMATION_LINE = [
(-1.0, 0.0),
(-2.0, 0.0),
(-3.0, 0.0),
]
FORMATION_TRIANGLE = [
(-1.0, -1.0),
(-1.0, 1.0),
]
# Choose formation here:
DESIRED_OFFSETS = FORMATION_LINE
def rot2(theta: float) -> np.ndarray:
"""2D rotation matrix."""
c, s = np.cos(theta), np.sin(theta)
return np.array([[c, -s], [s, c]])
def normalize_angle(a: float) -> float:
return (a + np.pi) % (2 * np.pi) - np.pi
def leader_trajectory(t: float) -> tuple:
"""
Define the leader's path as a function of time.
Here: a circular path of radius 3m at speed 0.5 rad/s.
"""
omega = 0.5 # rad/s
x = 3.0 + 3.0 * np.cos(omega * t)
y = 3.0 + 3.0 * np.sin(omega * t)
theta = omega * t + np.pi / 2 # tangent direction
return x, y, theta
def desired_position(leader_pose: tuple, offset: tuple) -> np.ndarray:
"""Compute desired world-frame position for a follower."""
xL, yL, thetaL = leader_pose
dx, dy = offset
delta = rot2(thetaL) @ np.array([dx, dy])
return np.array([xL + delta[0], yL + delta[1]])
def follower_control(
current: np.ndarray,
desired: np.ndarray,
current_heading: float,
k_pos: float = K_POS,
k_head: float = K_HEAD,
) -> tuple:
"""
Proportional controller for differential-drive follower.
Returns: (v, omega) velocity commands
"""
ex = desired[0] - current[0]
ey = desired[1] - current[1]
dist = np.sqrt(ex**2 + ey**2)
# Desired heading toward target
theta_d = np.arctan2(ey, ex)
theta_e = normalize_angle(theta_d - current_heading)
# Linear and angular velocity commands
v = k_pos * dist
omega = k_head * theta_e
# Clamp velocities
v = max(0.0, min(2.0, v))
omega = max(-3.0, min(3.0, omega))
return v, omega
def simulate(n_steps: int, formation_offsets: list) -> dict:
"""Run the formation simulation."""
n_agents = 1 + len(formation_offsets) # 1 leader + followers
n_followers = len(formation_offsets)
# State: [x, y, theta] per agent
state = np.zeros((n_agents, 3))
state[0] = [3.0 + 3.0, 3.0, 0.0] # leader initial
trajectory = [state.copy()]
for step in range(n_steps):
t = step * DT
# ── Leader moves along trajectory ──
xL, yL, thetaL = leader_trajectory(t)
state[0, :2] = [xL, yL]
state[0, 2] = thetaL
leader_pose = (xL, yL, thetaL)
# ── Each follower tracks its desired position ──
for i, offset in enumerate(formation_offsets):
agent_idx = i + 1
desired_pos = desired_position(leader_pose, offset)
v, omega = follower_control(
current=state[agent_idx, :2],
desired=desired_pos,
current_heading=state[agent_idx, 2],
)
# Differential drive update
state[agent_idx, 0] += v * np.cos(state[agent_idx, 2]) * DT
state[agent_idx, 1] += v * np.sin(state[agent_idx, 2]) * DT
state[agent_idx, 2] += omega * DT
state[agent_idx, 2] = normalize_angle(state[agent_idx, 2])
trajectory.append(state.copy())
return {"trajectory": trajectory, "offsets": formation_offsets}
def plot_formation(trajectory: np.ndarray, offsets: list, save_path: str = None):
"""Plot the formation at the final time step."""
n_agents = trajectory.shape[1]
n_followers = len(offsets)
fig, ax = plt.subplots(figsize=(8, 8))
# Plot leader path
lx = trajectory[:, 0, 0]
ly = trajectory[:, 0, 1]
ax.plot(lx, ly, 'b--', alpha=0.4, label="Leader path")
# Plot agents at final time
final = trajectory[-1]
# Leader
ax.plot(final[0, 0], final[0, 1], 'b^', markersize=15, label="Leader")
ax.annotate("Leader",
(final[0, 0], final[0, 1]),
textcoords="offset points", xytext=(8, 8), fontsize=9)
# Followers
colors = ['red', 'green', 'orange', 'purple']
for i in range(n_followers):
idx = i + 1
ax.plot(final[idx, 0], final[idx, 1], 'o',
color=colors[i % len(colors)], markersize=12)
ax.annotate(f"F{i+1}",
(final[idx, 0], final[idx, 1]),
textcoords="offset points", xytext=(6, 6), fontsize=9)
# Draw formation line to leader
ax.plot([final[0, 0], final[idx, 0]],
[final[0, 1], final[idx, 1]],
'--', color='gray', alpha=0.5)
ax.set_xlabel("X (m)")
ax.set_ylabel("Y (m)")
ax.set_title(f"Leader-Follower Formation — {len(offsets)}+1 robots")
ax.legend()
ax.set_aspect('equal')
ax.grid(True, alpha=0.3)
plt.tight_layout()
if save_path:
plt.savefig(save_path, dpi=150)
plt.show()
def main():
n_steps = int(T / DT)
result = simulate(n_steps, DESIRED_OFFSETS)
trajectory = np.array(result["trajectory"])
print(f"Simulation complete: {len(trajectory)} steps, "
f"{trajectory.shape[1]} agents")
print(f"Final positions:")
for i, s in enumerate(trajectory[-1]):
label = "Leader" if i == 0 else f"Follower {i}"
print(f" {label}: x={s[0]:.2f}, y={s[1]:.2f}, θ={np.degrees(s[2]):.1f}°")
plot_formation(trajectory, DESIRED_OFFSETS)
if __name__ == "__main__":
main()
4.4 ROS 2 Multi-Robot Setup¶
<!-- launch/multi_robot_formation.launch -->
<launch>
<!-- Launch N robot instances with namespaced topics -->
<arg name="num_robots" default="4"/>
<!-- Robot 1 (Leader) -->
<group ns="robot1">
<include file="$(find my_robot)/launch/robot.launch"/>
<!-- Leader-specific: publishes formation leader pose -->
<node pkg="formation_control" type="leader_node" name="leader"/>
</group>
<!-- Robots 2..N (Followers) -->
<include file="$(find my_robot)/launch/follower.launch">
<arg name="robot_name" value="robot2"/>
<arg name="offset_x" value="-1.0"/>
<arg name="offset_y" value="0.0"/>
</include>
<include file="$(find my_robot)/launch/follower.launch">
<arg name="robot_name" value="robot3"/>
<arg name="offset_x" value="-2.0"/>
<arg name="offset_y" value="0.0"/>
</include>
</launch>
Topic remapping for isolation:
| Topic | Remapped Topic | Notes |
|---|---|---|
/cmd_vel |
/robot1/cmd_vel |
Namespaced per robot |
/odom |
/robot1/odom |
|
/scan |
/robot1/scan |
5. Approach B — Intermediate: Consensus-Based Formation¶
5.1 Concept¶
Consensus-based formation uses graph theory to distribute the formation geometry across all robots. No single leader exists — instead, each robot exchanges state information with its neighbors and collectively agrees on the formation shape through local interactions.
Graph Topology Consensus Dynamics
◆ ─ ◆ Each robot i:
│ ╲ │ u_i = Σ a_ij (x_j - x_i)
◆ ─ ◆ ────────────────────
algebraic connectivity
5.2 Graph Laplacian¶
Represent the communication graph with an adjacency matrix \(A = [a_{ij}]\) and degree matrix \(D = \operatorname{diag}(d_i)\) where \(d_i = \sum_j a_{ij}\). The graph Laplacian is:
For a formation with desired offsets \(\delta_i\), the consensus protocol is:
In matrix form:
The formation converges if the graph is connected (algebraic connectivity \(\lambda_2(L) > 0\)).
5.3 Virtual Structure Approach¶
Treat the entire formation as a rigid virtual structure. All robots track target points on this structure as it moves:
where \(p_{\text{center}}(t)\) and \(R(t)\) are the structure's center position and rotation matrix.
5.4 Complete Python Code — Consensus Formation¶
"""
Approach B: Consensus-Based Formation Control
==============================================
Distributed formation control using graph Laplacian and consensus protocol.
No leader — all robots exchange information with neighbors.
"""
import numpy as np
import matplotlib.pyplot as plt
import networkx as nx
# ─── Parameters ────────────────────────────────────────────────────
DT = 0.02 # Time step (s)
T = 20.0 # Total time (s)
N_ROBOTS = 5 # Number of robots
K_CONSENSUS = 1.5 # Consensus gain
# ─── Formation Offsets (in world frame, static) ────────────────────
# Define desired relative positions (relative to formation center)
OFFSETS = np.array([
[0.0, 0.0], # Robot 0 (formation center reference)
[1.0, 0.0], # Robot 1
[-1.0, 0.0], # Robot 2
[0.0, 1.0], # Robot 3
[0.0, -1.0], # Robot 4
], dtype=float)
# ─── Communication Graph ───────────────────────────────────────────
# Define who talks to whom
ADJACENCY = np.array([
[0, 1, 1, 0, 0],
[1, 0, 1, 1, 0],
[1, 1, 0, 0, 1],
[0, 1, 0, 0, 1],
[0, 0, 1, 1, 0],
], dtype=float)
# Build Laplacian: L = D - A
DEGREE = np.diag(ADJACENCY.sum(axis=1))
LAPLACIAN = DEGREE - ADJACENCY
print("Graph Laplacian:")
print(LAPLACIAN)
# Algebraic connectivity (smallest non-zero eigenvalue)
G_nx = nx.from_numpy_array(ADJACENCY)
eigenvalues = sorted(nx.laplacian_spectrum(G_nx))
algebraic_conn = eigenvalues[1] # First non-zero eigenvalue
print(f"Algebraic connectivity λ₂ = {algebraic_conn:.3f}")
def normalize_angle(a: float) -> float:
return (a + np.pi) % (2 * np.pi) - np.pi
def consensus_control(positions: np.ndarray, offsets: np.ndarray,
laplacian: np.ndarray) -> np.ndarray:
"""
Consensus protocol: each robot moves toward agreement on formation.
u_i = k * Σ a_ij * [(x_j - δ_j) - (x_i - δ_i)]
"""
n = positions.shape[0]
desired = positions - offsets # Desired if in formation
# Distributed disagreement: L @ (pos - offsets)
disagreement = laplacian @ (positions - offsets)
return K_CONSENSUS * disagreement
def moving_target(positions: np.ndarray, t: float, omega: float = 0.5) -> np.ndarray:
"""
Apply a common motion to all desired positions (formation moves as a whole).
Here: circular motion of the formation center.
"""
center = np.array([3.0 * np.cos(omega * t), 3.0 * np.sin(omega * t)])
return OFFSETS + center # Desired positions in world frame
def simulate_consensus(n_steps: int, initial_pos: np.ndarray) -> dict:
"""Simulate consensus-based formation."""
positions = initial_pos.copy()
positions_history = [positions.copy()]
for step in range(n_steps):
t = step * DT
desired_world = moving_target(positions, t)
offsets_dynamic = desired_world - OFFSETS + OFFSETS # recompute desired
desired_world = OFFSETS + np.array([
3.0 * np.cos(0.5 * t), 3.0 * np.sin(0.5 * t)
])
# Consensus input
u = consensus_control(positions, OFFSETS, LAPLACIAN)
positions += u * DT
positions_history.append(positions.copy())
return {"positions": np.array(positions_history), "offsets": OFFSETS}
def plot_consensus(positions_history: np.ndarray, offsets: np.ndarray,
save_path: str = None):
"""Visualize the consensus formation over time."""
fig, ax = plt.subplots(figsize=(8, 8))
n_robots = positions_history.shape[1]
colors = plt.cm.tab10(np.linspace(0, 1, n_robots))
# Plot trajectories
for i in range(n_robots):
traj = positions_history[:, i, :]
ax.plot(traj[:, 0], traj[:, 1], '-', color=colors[i], alpha=0.5, linewidth=1)
ax.plot(traj[-1, 0], traj[-1, 1], 'o', color=colors[i],
markersize=12, label=f"Robot {i}")
ax.annotate(f"R{i}", (traj[-1, 0], traj[-1, 1]),
textcoords="offset points", xytext=(5, 5), fontsize=9)
# Plot desired formation centers (at final time)
center = positions_history[-1, 0] # Robot 0 as reference
for i, offset in enumerate(offsets):
dx = offset[0] + (positions_history[-1, 0, 0] - offsets[0, 0])
dy = offset[1] + (positions_history[-1, 0, 1] - offsets[0, 1])
ax.plot(dx, dy, 's', color='gray', markersize=6, alpha=0.4)
ax.set_xlabel("X (m)")
ax.set_ylabel("Y (m)")
ax.set_title(f"Consensus Formation — {n_robots} robots, λ₂={algebraic_conn:.2f}")
ax.legend()
ax.set_aspect('equal')
ax.grid(True, alpha=0.3)
plt.tight_layout()
if save_path:
plt.savefig(save_path, dpi=150)
plt.show()
def main():
n_steps = int(T / DT)
# Random initial positions
np.random.seed(42)
initial_pos = np.random.uniform(0, 6, size=(N_ROBOTS, 2)).astype(float)
result = simulate_consensus(n_steps, initial_pos)
positions_history = result["positions"]
print(f"Consensus simulation: {positions_history.shape[0]} steps, "
f"{positions_history.shape[1]} robots")
print(f"Algebraic connectivity λ₂ = {algebraic_conn:.3f} (graph is connected: {algebraic_conn > 0})")
# Final formation errors
final_pos = positions_history[-1]
center = final_pos[0]
for i in range(N_ROBOTS):
error = np.linalg.norm(final_pos[i] - (center + OFFSETS[i]))
print(f" Robot {i}: pos={final_pos[i]}, formation error={error:.3f}m")
plot_consensus(positions_history, OFFSETS)
if __name__ == "__main__":
main()
6. Approach C — Modern: Multi-Agent Reinforcement Learning¶
6.1 Concept¶
MARL for formation control trains a network policy where multiple robots learn to cooperate without explicit communication. We use Centralized Training + Decentralized Execution (CTDE):
Centralized Training (with global info)
┌─────────────────────────────────────┐
│ All robots' observations + │
│ critic has access to full state │
└────────────────┬────────────────────┘
│ train
┌────────────────▼────────────────────┐
│ Decentralized Execution (at test) │
│ Each actor only sees local obs │
│ No inter-robot communication │
└─────────────────────────────────────┘
6.2 QMIX — Value Decomposition¶
QMIX decomposes the joint action-value function \(Q_{\text{tot}}\) into per-agent functions \(Q_a\) with a mixing network that enforces individual Global Optima (IGOs):
The mixing network uses hypernetworks with monotonicity constraints:
This guarantees that maximizing \(Q_{\text{tot}}\) is equivalent to each agent independently maximizing its own \(Q_a\).
6.3 MAPPO — Multi-Agent PPO¶
MAPPO applies PPO to the CTDE framework. Each agent \(i\) has a policy \(\pi_{\theta_i}\) trained to maximize:
The centralized critic \(V_\phi(s)\) observes the full state \(s\); the decentralized actor \(\pi_{\theta_i}(o_i)\) only sees local observation \(o_i\).
6.4 PettingZoo / rllib Code — MARL Formation¶
"""
Approach C: Multi-Agent RL Formation Control
=============================================
Uses rllib (Ray) with a custom multi-agent environment for formation control.
Trains 5 robots to maintain relative distances using QMIX-style value decomposition.
"""
import numpy as np
import gymnasium as gym
from gymnasium import spaces
from ray.rllib.algorithms.maddpg import MADDPGAgent
from ray.tune import register_env
from ray import tune
class FormationEnv(gym.Env):
"""
Multi-agent formation environment.
Each agent observes: [self_pos_x, self_pos_y, rel_pos_to_others, target_offset_x, target_offset_y]
Actions: [v_x_cmd, v_y_cmd] (velocity commands)
Reward: negative formation error + movement penalty
"""
metadata = {"render_modes": ["human"]}
def __init__(self, config=None):
super().__init__()
self.n_agents = config.get("n_agents", 5)
self.dt = 0.1
self.max_steps = 500
# Observation: self pos (2) + relative to others (2*(n-1)) + target offset (2)
obs_dim = 2 + 2 * (self.n_agents - 1) + 2
self.observation_space = spaces.Box(
low=-10.0, high=10.0, shape=(obs_dim,), dtype=np.float32
)
# Action: 2D velocity command
self.action_space = spaces.Box(
low=-1.0, high=1.0, shape=(2,), dtype=np.float32
)
self.positions = None
self.target_offsets = None
self.step_count = 0
def _compute_offsets(self) -> np.ndarray:
"""Desired formation offsets (line formation)."""
offsets = np.zeros((self.n_agents, 2))
for i in range(1, self.n_agents):
offsets[i] = [-1.0 * i, 0.0]
return offsets
def reset(self, seed=None, options=None):
super().reset(seed=seed)
self.step_count = 0
# Randomize starting positions
rng = np.random.default_rng(seed)
self.positions = rng.uniform(0, 5, size=(self.n_agents, 2)).astype(np.float32)
# Formation offsets (leader at front, others behind)
self.target_offsets = self._compute_offsets()
return self._get_obs(), self._get_obs_as_dict()
def _get_obs(self) -> np.ndarray:
"""Return flat observation array (used by MADDPG)."""
obs_list = []
for i in range(self.n_agents):
obs_i = [self.positions[i, 0], self.positions[i, 1]]
# Relative positions to other agents
for j in range(self.n_agents):
if j != i:
obs_i.extend([
self.positions[j, 0] - self.positions[i, 0],
self.positions[j, 1] - self.positions[i, 1],
])
# Target offset from formation center
center = np.mean(self.positions, axis=0)
obs_i.extend([
center[0] + self.target_offsets[i, 0] - self.positions[i, 0],
center[1] + self.target_offsets[i, 1] - self.positions[i, 1],
])
obs_list.append(np.array(obs_i, dtype=np.float32))
return np.array(obs_list)
def _get_obs_as_dict(self) -> dict:
"""Return dict observation (for rllib multi-agent API)."""
obs = self._get_obs()
return {f"agent_{i}": obs[i] for i in range(self.n_agents)}
def step(self, actions):
self.step_count += 1
# Convert dict or array actions
if isinstance(actions, dict):
actions_list = [actions[f"agent_{i}"] for i in range(self.n_agents)]
else:
actions_list = actions
actions_np = np.array(actions_list, dtype=np.float32)
# Apply velocity commands
for i in range(self.n_agents):
self.positions[i] += actions_np[i] * 0.5 * self.dt
# Clamp positions
self.positions[i] = np.clip(self.positions[i], -10, 10)
# ── Reward: formation maintenance ──
center = np.mean(self.positions, axis=0)
rewards = []
for i in range(self.n_agents):
desired = center + self.target_offsets[i]
error = np.linalg.norm(self.positions[i] - desired)
# Reward: high when error is low, slight penalty for large movements
r = -error + 0.1 * (1.0 - np.linalg.norm(actions_np[i]))
rewards.append(float(r))
rewards_dict = {f"agent_{i}": rewards[i] for i in range(self.n_agents)}
# Termination
terminated = {f"agent_{i}": False for i in range(self.n_agents)}
terminated["__all__"] = self.step_count >= self.max_steps
truncated = {f"agent_{i}": False for i in range(self.n_agents)}
truncated["__all__"] = False
obs = self._get_obs_as_dict()
info = {}
return obs, rewards_dict, terminated, truncated, info
# ─── Registration & Training ────────────────────────────────────────
register_env("formation_env", lambda cfg: FormationEnv(cfg))
def train_maddpg(n_timesteps: int = 200_000):
"""Train MADDPG agents on the formation task."""
config = {
"env": "formation_env",
"env_config": {"n_agents": 5},
"num_workers": 2,
"gamma": 0.99,
"actor_lr": 1e-4,
"critic_lr": 1e-3,
"train_batch_size": 1024,
"model": {
"fcnet_hiddens": [128, 128],
},
"framework": "torch",
}
print(f"[INFO] Training MADDPG for {n_timesteps} timesteps...")
results = tune.run(
"MADDPG",
config=config,
stop={"timesteps_total": n_timesteps},
checkpoint_at_end=True,
)
best = results.get_best_result(metric="episode_reward_mean", mode="max")
print(f"[INFO] Best checkpoint: {best.checkpoint.path}")
return best
def evaluate_agent(checkpoint_path: str, n_episodes: int = 5):
"""Evaluate a trained formation policy."""
from ray.rllib.algorithms.maddpg import MADDPGAgent
env = FormationEnv({"n_agents": 5})
agent = MADDPGAgent(config={"env_config": {"n_agents": 5}, "framework": "torch"})
agent.restore(checkpoint_path)
for ep in range(n_episodes):
obs, _ = env.reset()
done = False
ep_reward = 0.0
while not done:
actions = {}
for i in range(env.n_agents):
action, _ = agent.compute_action(obs[f"agent_{i}"], policy_id=f"agent_{i}")
actions[f"agent_{i}"] = action
obs, rewards, terminated, truncated, info = env.step(actions)
ep_reward += sum(rewards.values())
done = terminated["__all__"] or truncated["__all__"]
print(f" Episode {ep+1}: reward={ep_reward:.1f}")
if __name__ == "__main__":
import sys
if len(sys.argv) > 1 and sys.argv[1] == "eval":
evaluate_agent(sys.argv[2] if len(sys.argv) > 2 else None)
else:
train_maddpg()
7. Comparison of Formation Approaches¶
| Criteria | Leader-Follower | Consensus / Virtual Structure | MARL (MADDPG/QMIX) |
|---|---|---|---|
| Scalability | ★★★☆☆ (single point of failure) | ★★★★☆ | ★★★★★ |
| Fault Tolerance | ★★☆☆☆ | ★★★★☆ | ★★★★☆ |
| Formation Rigidity | ✅ Explicit (rigid offsets) | ✅ Graph-theoretic rigidity | ✅ Emerges from reward |
| Communication | One-way (leader → followers) | Bidirectional (all-to-all) | None / partial (CTDE) |
| Convergence Speed | Fast (leader-driven) | Moderate (distributed) | Slow (needs training) |
| Handles Obstacles | ❌ Requires replanning | ❌ Needs collision avoidance | ✅ Learns reactive behaviors |
| Setup Complexity | Low | Medium | High |
| Computation Cost | Very low | Low | High (training + inference) |
| Adaptability | Fixed formations only | Reconfigurable (change graph) | Learns new formations |
| Best for | Simple, small groups | Structured environments | Complex, large groups |
8. Step-by-Step Implementation Guide¶
Phase 1 — Single-Robot Teleoperation (All Approaches)¶
- Set up each robot with ROS 2 and verify
cmd_vel/odomtopics work. - Test teleoperation (
teleop_twist_keyboard) for each robot individually. - Set up the WiFi network and verify inter-robot ping connectivity.
Phase 2 — ROS 2 Namespace Setup¶
# Launch robots with separate namespaces
ros2 launch multi_robot_bringup robot_multi.launch.py num_robots:=4
# Verify topics are namespaced:
ros2 topic list | grep robot
# Expected: /robot1/cmd_vel, /robot2/cmd_vel, ...
Phase 3 — Leader-Follower (Approach A)¶
- Implement the leader trajectory publisher.
- Add the follower controller node that subscribes to
/robot1/pose. - Configure offsets per robot in the parameter file.
- Run in simulation first, then on hardware.
Phase 4 — Consensus (Approach B)¶
- Define the communication topology (mesh, chain, star).
- Implement the consensus protocol using
networkxLaplacian. - Add inter-robot communication via ROS 2 topics (
/robot{i}/state). - Test with a static formation, then add a moving target.
Phase 5 — MARL (Approach C)¶
- Install Ray:
pip install ray[rllib] - Run the formation environment in simulation:
python formation_rl.py - Train for ~200k timesteps (or use pre-trained checkpoints).
- Deploy trained policies to real robots via ROS 2.
9. Extensions and Variations¶
9.1 Formation Reconfiguration¶
Switch between formation shapes dynamically by changing offsets:
SHAPES = {
"line": [(-i, 0) for i in range(n)],
"triangle": [(0,0), (-1,-1), (-1,1)],
"diamond": [(0,0), (0,-1), (1,0), (0,1)],
}
# Trigger shape change via ROS parameter or service call
9.2 Obstacle Avoidance in Formation¶
Add a potential field layer to the consensus protocol:
9.3 Dynamic Formation Sizes¶
Use graph-theoretic rigidity to verify that a formation of \(n\) robots with \(m\) edges is minimally rigid:
10. References¶
- Ren, W., & Beard, R.W. (2005). Consensus Seeking in Multiagent Systems Under Dynamically Changing Interaction Topologies. IEEE Transactions on Automatic Control, 50(5), 655–661.
- Jadbabaie, A., Lin, J., & Morse, A.S. (2003). Coordination of Groups of Mobile Autonomous Agents Using Nearest Neighbor Rules. IEEE Transactions on Automatic Control, 48(6), 988–1001.
- Olfati-Saber, R., & Murray, R.M. (2004). Consensus Problems in Networks of Agents With Switching Topology and Time-Delays. IEEE Transactions on Automatic Control, 49(9), 1520–1533.
- Desai, J.P., Ostrowski, J.P., & Kumar, V. (2001). Modeling and Control of Formations of Nonholonomic Mobile Robots. IEEE Transactions on Robotics and Automation, 17(6), 905–908.
- Low, C.B. (2019). A Flexible Hybrid Formation Control for Mobile Robots Using Reinforcement Learning. IEEE Robotics and Automation Letters.
- Lowe, R., Wu, Y., et al. (2017). Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments. NeurIPS.
- Rashid, T., Samvelov, M., et al. (2018). QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning. ICML.
- Yu, C., et al. (2022). The Surprising Effectiveness of PPO in Cooperative Multi-Agent Reinforcement Learning. ICML.
- Peterson, F., et al. (2023). Networked Multi-Robot Systems. ACM Computing Surveys.
- Ray/rllib Documentation — Multi-agent RL framework
- NetworkX Laplacian Documentation — Graph theory tools
- ROS 2 Multi-Robot Documentation — Namespace and remapping