巡线机器人（Line Following Robot）¶

项目类型： 导航（Navigation） | 难度： ★★☆☆☆ 到 ★★★★☆（取决于所选方案） | 预计时间： 1–3 个周末

1. 项目概述¶

巡线机器人（Line Following Robot） 是移动机器人领域最经典的入门挑战之一。目标很简单：构建一个能够检测地面标线（通常是白底上的黑色条带，或反之）并在前进过程中沿标线行驶的机器人。

  ┌──────────────────────────────────────────────┐
  │                 路径（俯视图）                  │
  │                                               │
  │     ┌───┐                                     │
  │     │ R │──►                                  │
  │     └───┘    \                                │
  │               ╲  ─────────────────────        │
  │                ╱                               │
  │     ┌───┐   /                                  │
  │     │   │◄─┘                                   │
  │     └───┘                                      │
  │   机器人沿黑色标线行驶                           │
  └──────────────────────────────────────────────┘

在本项目中，你将探索 三种逐步递进的方法：

方案	方法	传感器
传统方案	PID 控制 + 阈值分割	摄像头（单目）
中级方案	Stanley / Pure Pursuit 控制器	摄像头或激光雷达（LiDAR）
现代方案	深度强化学习（DQN / PPO）	摄像头或模拟传感器阵列

每种方案都在前一种的基础上引入新的概念，涵盖控制理论（Control Theory）、路径跟踪（Path Tracking）和基于学习的方法（Learning-based Methods）。

2. 硬件与软件需求¶

硬件¶

组件	规格	备注
底盘（Chassis）	两轮差速驱动	任意套件（如 Arduino 小车套件）
电机（Motors）	2× 直流减速电机（带编码器）	编码器可选但推荐
微控制器（MCU）	Arduino Uno / Mega 或 Raspberry Pi	使用摄像头方案需要树莓派
摄像头（Camera）	USB 摄像头或 Pi Camera	视觉方案和 RL 方案必需
（替代方案）红外阵列	5–8 路红外反射传感器	更简单，无需视觉
电机驱动（Motor Driver）	L298N / TB6612	H 桥驱动模块
电池（Battery）	7.4V LiPo 或 8×AA	为电机和电子设备供电
巡线赛道	白色地面上的黑色电工胶带	弯道最小半径 40 cm

软件¶

包名	版本	用途
Python	≥ 3.8	核心语言
OpenCV	≥ 4.5	图像处理与线检测
NumPy	≥ 1.20	数值计算
PySerial	≥ 3.5	与 MCU 串口通信
PyTorch / TensorFlow	≥ 1.13 / ≥ 2.10	深度强化学习（方案 C）
Stable-Baselines3	≥ 1.7	PPO 算法实现
OpenAI Gym	≥ 0.21	强化学习环境接口

pip install opencv-python numpy pyserial stable-baselines3 gymnasium

3. 控制回路——高层架构¶

三种方案共享相同的 感知 → 计算 → 执行 基本回路：

  ┌────────────┐     ┌──────────────┐     ┌──────────────┐
  │  传感器     │────▶│  控制器       │────▶│  执行器       │
  │  (Camera/  │     │  (PID/Stanley│     │  (电机 PWM)   │
  │   IR/LiDAR)│     │   /RL Agent) │     │              │
  └──────┬──────┘     └──────────────┘     └──────┬───────┘
         │                                        │
         │          ┌──────────────┐              │
         │          │   环境        │              │
         └──────────│  (巡线赛道)   │◀─────────────┘
                    └──────────────┘

误差信号（Error Signal） 是机器人当前位置与检测到的标线中心之间的水平偏移量（以像素或米为单位）。

4. 方案 A——传统方案：PID 控制 + 阈值分割¶

4.1 概念¶

最简单也是最广泛使用的方法。我们通过图像阈值分割（Thresholding）在摄像头画面中检测标线，计算标线中心与画面中心的偏移量，将此误差输入 PID 控制器 输出转向指令。

4.2 PID 控制器¶

PID（比例-积分-微分，Proportional–Integral–Derivative）控制律为：

\[ u(t) = K_p \, e(t) \;+\; K_i \int_0^t e(\tau)\,d\tau \;+\; K_d \frac{de(t)}{dt} \]

其中：

\(e(t)\) — 横向误差（标线中心 − 画面中心），单位为像素
\(K_p\) — 比例增益（Proportional Gain），修正当前误差
\(K_i\) — 积分增益（Integral Gain），消除稳态偏差
\(K_d\) — 微分增益（Derivative Gain），抑制振荡

4.3 线检测流水线¶

  摄像头画面（BGR）
        │
        ▼
  转换为灰度图
        │
        ▼
  高斯模糊（5×5）
        │
        ▼
  二值化阈值（THRESH_BINARY_INV）
        │
        ▼
  裁剪下半部分（感兴趣区域 ROI）
        │
        ▼
  查找轮廓 / 计算质心
        │
        ▼
  误差 = centroid_x − frame_center_x

4.4 完整 Python 代码¶

"""
方案 A：基于摄像头 + OpenCV 的 PID 巡线控制
=====================================================
依赖：opencv-python, numpy, pyserial（连接实际硬件时需要）
"""

import cv2
import numpy as np
import time

# ─── 配置 ──────────────────────────────────────────────────────
CAMERA_INDEX = 0            # 0 为默认摄像头
FRAME_WIDTH = 640
FRAME_HEIGHT = 480
ROI_TOP_RATIO = 0.5         # 使用画面底部 50% 作为 ROI
THRESHOLD_VALUE = 80        # 二值化阈值（需根据光照环境调节）
BASE_SPEED = 150            # 基础电机 PWM（0–255）

# PID 增益参数（需根据你的机器人调节）
KP = 0.4
KI = 0.01
KD = 0.3


class PIDController:
    """离散 PID 控制器（Discrete PID Controller）。"""

    def __init__(self, kp: float, ki: float, kd: float):
        self.kp = kp
        self.ki = ki
        self.kd = kd
        self.prev_error = 0.0
        self.integral = 0.0

    def update(self, error: float, dt: float) -> float:
        self.integral += error * dt
        derivative = (error - self.prev_error) / dt if dt > 0 else 0.0
        output = self.kp * error + self.ki * self.integral + self.kd * derivative
        self.prev_error = error
        return output

    def reset(self):
        self.prev_error = 0.0
        self.integral = 0.0


def detect_line(frame: np.ndarray, roi_top_ratio: float = 0.5,
                thresh_val: int = 80) -> tuple:
    """
    在摄像头画面中检测标线，返回（质心 x 坐标, 误差, 掩膜画面）。

    Parameters
    ----------
    frame : BGR 格式的摄像头画面
    roi_top_ratio : 从顶部裁剪的比例（保留底部区域）
    thresh_val : 二值化阈值

    Returns
    -------
    centroid_x : 标线质心的 x 坐标（未检测到时为 -1）
    error : 相对画面中心的水平偏移（像素）
    masked : 二值化后的 ROI 图像，用于可视化
    """
    h, w = frame.shape[:2]

    # 1. 灰度 + 模糊
    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
    blurred = cv2.GaussianBlur(gray, (5, 5), 0)

    # 2. 感兴趣区域（ROI）：底部区域
    roi_top = int(h * roi_top_ratio)
    roi = blurred[roi_top:h, :]

    # 3. 二值化阈值（标线为深色）
    _, thresh = cv2.threshold(roi, thresh_val, 255, cv2.THRESH_BINARY_INV)

    # 4. 查找轮廓（Find Contours）
    contours, _ = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)

    centroid_x = -1
    error = 0.0

    if contours:
        # 选取最大轮廓（假设为标线）
        largest = max(contours, key=cv2.contourArea)
        M = cv2.moments(largest)
        if M["m00"] > 0:
            centroid_x = int(M["m10"] / M["m00"])
            error = centroid_x - w // 2  # 正值表示标线在右侧

    return centroid_x, error, thresh


def send_motor_command(left_speed: int, right_speed: int):
    """
    向硬件发送电机速度。
    请替换为你实际使用的串口/GPIO 通信方式。
    """
    # Arduino 串口协议示例：
    # ser.write(f"M,{left_speed},{right_speed}\n".encode())
    print(f"  Motors -> L:{left_speed:4d}  R:{right_speed:4d}")


def main():
    cap = cv2.VideoCapture(CAMERA_INDEX)
    cap.set(cv2.CAP_PROP_FRAME_WIDTH, FRAME_WIDTH)
    cap.set(cv2.CAP_PROP_FRAME_HEIGHT, FRAME_HEIGHT)

    if not cap.isOpened():
        print("[ERROR] 无法打开摄像头")
        return

    pid = PIDController(KP, KI, KD)
    prev_time = time.time()

    print("[INFO] PID 巡线已启动。按 'q' 退出。")

    while True:
        ret, frame = cap.read()
        if not ret:
            break

        current_time = time.time()
        dt = current_time - prev_time
        prev_time = current_time

        # ── 感知 ──
        centroid_x, error, thresh = detect_line(frame, ROI_TOP_RATIO, THRESHOLD_VALUE)

        # ── 计算 ──
        steering = pid.update(error, dt)
        # 限幅
        max_steer = BASE_SPEED
        steering = max(-max_steer, min(max_steer, steering))

        # ── 执行 ──
        left_speed = int(BASE_SPEED + steering)
        right_speed = int(BASE_SPEED - steering)
        left_speed = max(0, min(255, left_speed))
        right_speed = max(0, min(255, right_speed))

        if centroid_x == -1:
            # 丢失标线——原地旋转搜索
            print("  [WARN] 标线丢失！正在搜索...")
            send_motor_command(-80, 80)
        else:
            send_motor_command(left_speed, right_speed)

        # ── 可视化 ──
        roi_top = int(FRAME_HEIGHT * ROI_TOP_RATIO)
        cv2.rectangle(frame, (0, roi_top), (FRAME_WIDTH, FRAME_HEIGHT), (0, 255, 0), 2)
        if centroid_x >= 0:
            cv2.circle(frame, (centroid_x, roi_top + 30), 10, (0, 0, 255), -1)
        cv2.putText(frame, f"Error: {error:.0f}  Steer: {steering:.1f}",
                    (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (255, 255, 0), 2)

        cv2.imshow("Line Following - PID", frame)
        cv2.imshow("Threshold", thresh)

        if cv2.waitKey(1) & 0xFF == ord('q'):
            break

    cap.release()
    cv2.destroyAllWindows()
    print("[INFO] 已停止。")


if __name__ == "__main__":
    main()

4.5 PID 参数调节¶

步骤	操作
1	设 \(K_i = 0\)，\(K_d = 0\)。增大 \(K_p\) 直到机器人出现明显振荡。
2	将 \(K_p\) 设为该值的 ~60%。
3	增大 \(K_d\) 直到振荡消除。
4	加入较小的 \(K_i\) 以消除长直线上的稳态偏差。
5	在弯道上测试；如有切弯或过冲则重新调整。

5. 方案 B——中级方案：Stanley 控制器¶

5.1 概念¶

Stanley 控制器（Stanley Controller） 由斯坦福大学在 DARPA 大挑战赛中开发，是一种几何路径跟踪控制器，同时考虑 横向误差 \(e\) 和 航向误差 \(\theta_e\)。相比 PID，在弯曲路径上能产生更平滑的轨迹。

5.2 控制律¶

\[ \delta = \theta_e + \arctan\!\left(\frac{k \, e}{v + \epsilon}\right) \]

其中：

\(\delta\) — 转向角指令（弧度）
\(\theta_e\) — 航向误差：机器人朝向与路径切线方向的夹角
\(e\) — 横向偏差（Cross-track Error），向右为正
\(v\) — 机器人前进速度
\(k\) — 增益参数（控制收敛速度）
\(\epsilon\) — 小常数，防止除零

  路径切线方向
  ──────────────────────►
            │  θ_e（航向误差）
            │ /
            │/
      ┌─────┐
      │  R  │───── e（横向误差） ──►
      └─────┘

5.3 路径表示¶

对于巡线任务，我们将检测到的标线表示为一系列路径点（Waypoints）。每个控制周期：

找到路径上距离机器人前轮最近的 路径点。
计算 \(e\)（到路径的有符号距离）和 \(\theta_e\)（角度差）。
应用 Stanley 公式。

5.4 完整 Python 代码¶

"""
方案 B：Stanley 控制器巡线
==================================================
模拟差速驱动机器人沿参数化路径行驶。
无需硬件——使用 matplotlib 进行可视化。
"""

import numpy as np
import matplotlib.pyplot as plt
from matplotlib.patches import FancyArrowPatch
import matplotlib.animation as animation

# ─── 参数 ─────────────────────────────────────────────────────
DT = 0.05               # 时间步长（秒）
TOTAL_TIME = 30.0        # 仿真时长（秒）
V = 1.0                  # 前进速度（m/s）
K_STANLEY = 2.5          # Stanley 增益
WHEEL_BASE = 0.3         # 机器人轴距（米，用于转向换算）
ROBOT_LENGTH = 0.4

# ─── 路径定义（波浪线） ────────────────────────────────────────
def path_points(n=1000):
    """生成一条平滑的正弦波路径。"""
    s = np.linspace(0, 20, n)
    x = s
    y = 1.5 * np.sin(0.3 * s)
    return x, y

PATH_X, PATH_Y = path_points()


def closest_point_index(px: float, py: float) -> int:
    """找到路径上距 (px, py) 最近的点的索引。"""
    dists = (PATH_X - px)**2 + (PATH_Y - py)**2
    return int(np.argmin(dists))


def path_heading(idx: int) -> float:
    """路径在指定索引处的切线航向角。"""
    dx = PATH_X[min(idx + 1, len(PATH_X)-1)] - PATH_X[max(idx - 1, 0)]
    dy = PATH_Y[min(idx + 1, len(PATH_Y)-1)] - PATH_Y[max(idx - 1, 0])
    return np.arctan2(dy, dx)


def normalize_angle(a: float) -> float:
    """将角度归一化到 [-pi, pi]。"""
    return (a + np.pi) % (2 * np.pi) - np.pi


def stanley_control(x: float, y: float, heading: float, v: float) -> float:
    """
    使用 Stanley 控制器计算转向角。

    Returns 转向角 δ（弧度）。
    """
    idx = closest_point_index(x, y)
    path_h = path_heading(idx)

    # 横向误差（有符号）：(路径点到机器人) × 路径切线 的叉积
    dx = x - PATH_X[idx]
    dy = y - PATH_Y[idx]
    tangent = np.array([np.cos(path_h), np.sin(path_h)])
    normal = np.array([-tangent[1], tangent[0]])
    e = dx * normal[0] + dy * normal[1]

    # 航向误差
    theta_e = normalize_angle(path_h - heading)

    # Stanley 公式
    steer = theta_e + np.arctan2(K_STANLEY * e, v + 1e-6)
    return steer, e, theta_e, idx


def main():
    # 初始状态
    x, y, heading = PATH_X[0] - 0.5, PATH_Y[0] + 0.5, 0.3
    trajectory_x, trajectory_y = [x], [y]
    errors = []

    steps = int(TOTAL_TIME / DT)
    for step in range(steps):
        steer, e, theta_e, idx = stanley_control(x, y, heading, V)

        # 差速驱动：将转向角转换为轮速
        # ω = v * tan(δ) / L
        omega = V * np.tan(steer) / WHEEL_BASE

        # 更新状态
        x += V * np.cos(heading) * DT
        y += V * np.sin(heading) * DT
        heading += omega * DT
        heading = normalize_angle(heading)

        trajectory_x.append(x)
        trajectory_y.append(y)
        errors.append(abs(e))

    # ── 绘制结果 ──
    fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(10, 8))

    ax1.plot(PATH_X, PATH_Y, 'b--', label='路径', linewidth=2)
    ax1.plot(trajectory_x, trajectory_y, 'r-', label='机器人轨迹', linewidth=1.5)
    ax1.set_xlabel('X (m)')
    ax1.set_ylabel('Y (m)')
    ax1.set_title('Stanley 控制器 — 巡线')
    ax1.legend()
    ax1.set_aspect('equal')
    ax1.grid(True, alpha=0.3)

    ax2.plot(np.arange(len(errors)) * DT, errors, 'g-', linewidth=1)
    ax2.set_xlabel('时间 (s)')
    ax2.set_ylabel('|横向误差| (m)')
    ax2.set_title('横向误差随时间变化')
    ax2.grid(True, alpha=0.3)

    plt.tight_layout()
    plt.savefig("stanley_line_following.png", dpi=150)
    plt.show()

    print(f"平均误差: {np.mean(errors):.4f} m")
    print(f"最大误差: {np.max(errors):.4f} m")


if __name__ == "__main__":
    main()

4.5 PID 与 Stanley 的关键区别¶

  PID 控制器                        Stanley 控制器
  ────────────                      ──────────────
  仅使用横向误差                    同时使用横向 + 航向误差
  无几何模型                        考虑路径切线方向
  在急弯处振荡                      平滑过弯
  实现简单                          需要路径表示

       ╭───╮                              ╭───╮
      /  e  \  （过冲）                  /  e  \
  ───╱───────╲───                 ─────╱────────╲───
     机器人路径                     机器人紧贴路径行驶

6. 方案 C——现代方案：深度强化学习（Deep RL）¶

6.1 概念¶

与手动编码控制器不同，我们使用强化学习（Reinforcement Learning）训练一个 神经网络策略（Neural Network Policy）。智能体（Agent）观察传感器读数（如红外值阵列或下采样的摄像头图像），学习输出电机指令以最大化累积奖励（保持在线上、持续前进）。

我们使用 Stable-Baselines3 中的 近端策略优化（PPO, Proximal Policy Optimization） 算法。

6.2 环境设置¶

  ┌─────────────────────────────────────────────┐
  │              强化学习环境                      │
  │                                              │
  │  状态:  [s1, s2, ..., s8]  （传感器阵列）     │
  │                                              │
  │  动作:  [left_speed, right_speed]            │
  │          连续值 ∈ [-1, 1]                    │
  │                                              │
  │  奖励:  +1.0  在线上                         │
  │          +0.1×前进速度                        │
  │          -10.0 偏离赛道                       │
  │          -0.01  每时间步（时间惩罚）           │
  └─────────────────────────────────────────────┘

6.3 完整代码——自定义 Gym 环境 + PPO 训练¶

"""
方案 C：基于 PPO 的深度强化学习巡线
============================================
自定义 Gymnasium 环境模拟巡线任务。
训练 PPO 智能体使用 8 路传感器阵列巡线。
"""

import numpy as np
import gymnasium as gym
from gymnasium import spaces
from stable_baselines3 import PPO
from stable_baselines3.common.env_checker import check_env


class LineFollowEnv(gym.Env):
    """
    模拟巡线环境。

    - 赛道：正弦波标线，位于二维平面
    - 机器人：差速驱动，前置 8 个传感器
    - 状态：8 个传感器读数 ∈ [0, 1]（1 = 检测到标线）
    - 动作：[left_speed, right_speed] ∈ [-1, 1]
    """

    metadata = {"render_modes": ["human"]}

    def __init__(self, render_mode=None):
        super().__init__()
        self.render_mode = render_mode

        # 观测空间：8 个传感器读数
        self.observation_space = spaces.Box(low=0.0, high=1.0, shape=(8,), dtype=np.float32)
        # 动作空间：左右轮速度
        self.action_space = spaces.Box(low=-1.0, high=1.0, shape=(2,), dtype=np.float32)

        # 仿真参数
        self.dt = 0.1
        self.max_steps = 500
        self.track_width = 0.5   # 判定为"在线上"的距离
        self.sensor_spread = 0.3  # 传感器阵列半宽

        self.robot_x = 0.0
        self.robot_y = 0.0
        self.robot_theta = 0.0
        self.step_count = 0

    def _track_y(self, x: float) -> float:
        """标线在位置 x 处的 y 坐标。"""
        return 2.0 * np.sin(0.3 * x)

    def _track_tangent(self, x: float) -> float:
        """标线在位置 x 处的切线航向。"""
        dy_dx = 2.0 * 0.3 * np.cos(0.3 * x)
        return np.arctan2(dy_dx, 1.0)

    def _get_sensor_readings(self) -> np.ndarray:
        """
        模拟分布在机器人前端的 8 个传感器。
        每个传感器：靠近标线报告 1.0，远离报告 0.0。
        使用高斯衰减。
        """
        readings = np.zeros(8, dtype=np.float32)
        for i in range(8):
            # 传感器在机器人坐标系中的位置
            offset = self.sensor_spread * (2 * i / 7 - 1)  # -0.3 到 +0.3
            # 转换到世界坐标系
            sx = self.robot_x + offset * np.cos(self.robot_theta + np.pi / 2)
            sy = self.robot_y + offset * np.sin(self.robot_theta + np.pi / 2)
            # 到标线的距离
            line_y = self._track_y(sx)
            dist = abs(sy - line_y)
            # 高斯响应
            readings[i] = np.exp(-(dist ** 2) / (2 * (self.track_width / 3) ** 2))
        return np.clip(readings, 0.0, 1.0)

    def _cross_track_error(self) -> float:
        """机器人到标线的有符号距离。"""
        return self.robot_y - self._track_y(self.robot_x)

    def reset(self, seed=None, options=None):
        super().reset(seed=seed)
        self.robot_x = 0.0
        self.robot_y = self._track_y(0.0) + self.np_random.uniform(-0.3, 0.3)
        self.robot_theta = self._track_tangent(0.0) + self.np_random.uniform(-0.2, 0.2)
        self.step_count = 0
        obs = self._get_sensor_readings()
        return obs, {}

    def step(self, action):
        self.step_count += 1
        left_speed = float(action[0])
        right_speed = float(action[1])

        # 差速驱动运动学
        v = 0.5 * (left_speed + right_speed)   # 线速度
        omega = 1.5 * (right_speed - left_speed)  # 角速度

        self.robot_x += v * np.cos(self.robot_theta) * self.dt
        self.robot_y += v * np.sin(self.robot_theta) * self.dt
        self.robot_theta += omega * self.dt

        # ── 奖励 ──
        cte = abs(self._cross_track_error())
        on_line = cte < self.track_width

        reward = 0.0
        if on_line:
            reward += 1.0                          # 在线上的奖励
            reward += 0.3 * max(v, 0)              # 前进速度奖励
        else:
            reward -= 2.0                          # 偏离标线的惩罚

        reward -= 0.01 * cte                       # 距离惩罚
        reward -= 0.01                             # 时间惩罚

        # 终止条件
        terminated = cte > 2.0                     # 偏离太远
        truncated = self.step_count >= self.max_steps

        obs = self._get_sensor_readings()
        info = {"cross_track_error": cte, "on_line": on_line}

        return obs, reward, terminated, truncated, info

    def render(self):
        pass  # 可使用 matplotlib 添加可视化


def train_ppo(total_timesteps: int = 100_000, save_path: str = "ppo_line_follower"):
    """在巡线环境中训练 PPO 智能体。"""
    env = LineFollowEnv()
    check_env(env)  # 验证环境

    model = PPO(
        policy="MlpPolicy",
        env=env,
        learning_rate=3e-4,
        n_steps=1024,
        batch_size=64,
        n_epochs=10,
        gamma=0.99,
        gae_lambda=0.95,
        clip_range=0.2,
        verbose=1,
        tensorboard_log="./tb_logs/",
    )

    print(f"[INFO] 训练 PPO，共 {total_timesteps} 个时间步...")
    model.learn(total_timesteps=total_timesteps)
    model.save(save_path)
    print(f"[INFO] 模型已保存至 {save_path}")
    return model


def evaluate(model_path: str = "ppo_line_follower", episodes: int = 5):
    """评估训练好的模型。"""
    env = LineFollowEnv()
    model = PPO.load(model_path)

    for ep in range(episodes):
        obs, _ = env.reset()
        total_reward = 0.0
        done = False
        steps = 0

        while not done:
            action, _ = model.predict(obs, deterministic=True)
            obs, reward, terminated, truncated, info = env.step(action)
            total_reward += reward
            done = terminated or truncated
            steps += 1

        print(f"  Episode {ep+1}: reward={total_reward:.1f}, "
              f"steps={steps}, final_CTE={info['cross_track_error']:.3f}")


if __name__ == "__main__":
    import sys
    if len(sys.argv) > 1 and sys.argv[1] == "eval":
        evaluate()
    else:
        train_ppo()

6.4 训练技巧¶

技巧	说明
课程学习（Curriculum Learning）	从直线开始，逐渐增加弯道曲率
奖励塑形（Reward Shaping）	不要一开始就设过重的惩罚——智能体可能学会原地不动
传感器噪声	在传感器读数中加入高斯噪声以提高鲁棒性
观测历史	将最近 3–4 帧堆叠作为输入，使智能体能推断速度方向
超参数	使用提供的值作为起点；调节 `learning_rate` 和 `clip_range`

7. 分步实施指南¶

第一阶段——搭建赛道（所有方案通用）¶

在白色/反光地面上铺设黑色电工胶带。
创建直线、缓弯（≥ 40 cm 半径）和交叉口。
确保光照一致（避免阳光直射赛道）。

第二阶段——方案 A（PID）¶

安装摄像头，朝前下方倾斜（与垂直方向成 30–45°）。
使用 cv2.threshold 在样本画面上标定阈值。
单独运行 detect_line() 验证质心检测。
将 PID 输出连接到电机指令（通过串口发送到 Arduino）。
按顺序调节 \(K_p\)、\(K_d\)、\(K_i\)（参见第 4.5 节）。

第三阶段——方案 B（Stanley）¶

沿标线采集路径点（手动驾驶，记录里程计/GPS 数据）。
或从摄像头检测到的标线生成路径点（像素坐标 → 世界坐标转换）。
实现 stanley_control() 并先在仿真中测试。
使用与方案 A 相同的传感器流水线部署到机器人上。
调节增益 \(k\)——从 \(k = 2.0\) 开始，增大直到出现振荡。

第四阶段——方案 C（深度 RL）¶

在仿真中运行自定义环境：python line_follow_rl.py
训练 PPO 智能体（~100k 时间步，CPU 上约 5 分钟）。
评估：python line_follow_rl.py eval。
（可选）部署到实际机器人：将模拟传感器替换为真实红外/摄像头读数，用少量真实世界数据微调。

8. 三种方案对比¶

评价指标	PID + 阈值分割	Stanley 控制器	深度 RL（PPO）
直线精度	★★★★☆	★★★★★	★★★★☆
弯道精度	★★★☆☆	★★★★★	★★★★☆
抗噪声能力	★★★☆☆	★★★★☆	★★★★★
抗光照变化	★★☆☆☆	★★★☆☆	★★★★☆
搭建复杂度	低	中	高
调参难度	中（3 个增益）	低（1 个增益）	高（超参数多）
计算开销	极低	低	高（推理时）
是否需要训练数据	否	否	是（仿真或真实）
应对急弯	差	好	好（训练后）
最佳应用场景	简单赛道	平滑竞速线路	复杂/多变赛道

9. 扩展与变体¶

9.1 赛道变体¶

彩色标线 — 使用 HSV 色彩过滤跟踪红/绿/蓝色标线。
虚线 — 处理标线间断（记住最后已知位置）。
多线 / 交叉口 — 在分叉口决定转向方向。
二维码 / 标记 — 沿赛道放置标记用于定位。

9.2 传感器改进¶

基于激光雷达（LiDAR） — 使用 2D LiDAR 检测地面反射率差异。
双目立体视觉 — 估计标线的三维距离。
事件相机（Event Camera） — 超低延迟的高速标线检测。

9.3 算法增强¶

模型预测控制（MPC, Model Predictive Control） — 在预测时域上优化，实现预判式转向。
模糊逻辑控制（Fuzzy Logic Control） — 使用语言规则代替 PID 增益。
行为克隆（Behavior Cloning） — 从人类驾驶示范中学习。
仿真到现实迁移（Sim-to-Real Transfer） — 在高保真仿真器（Isaac Sim、Gazebo）中训练 RL，再部署到硬件。

9.4 竞赛挑战¶

速度挑战 — 最大化单圈速度（需要预测性控制）。
耐力挑战 — 连续运行 1 小时无需人工干预。
避障挑战 — 在赛道上放置障碍物，将巡线与避障结合。

10. 参考资料¶

Craig, J.J. (2005). Introduction to Robotics: Mechanics and Control. 3^rd ed., Pearson.
Thrun, S., Montemerlo, M., et al. (2006). "Stanley: The Robot that Won the DARPA Grand Challenge." Journal of Field Robotics, 23(9), 661–692.
Hoffmann, G.M., Tomlin, C.J., Montemerlo, M., & Thrun, S. (2007). "Autonomous Automobile Trajectory Tracking for Off-Road Driving." IEEE Control Systems Magazine.
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). "Proximal Policy Optimization Algorithms." arXiv:1707.06347.
Raffin, A., Hill, A., Gleave, A., et al. (2021). "Stable-Baselines3: Reliable Reinforcement Learning Implementations." JMLR, 22(268), 1–8.
OpenCV 文档 — docs.opencv.org
Gymnasium 文档 — gymnasium.farama.org