SLAM：同时定位与地图构建（Simultaneous Localization and Mapping）¶

SLAM 是指在未知环境中同时构建地图**和**在该地图中定位机器人的问题。它是机器人领域最基本的问题之一——几乎每个自主机器人都需要知道"我在哪里？"和"周围环境是什么样的？"

关于 ROS 实现细节，请参阅 ROS SLAM 教程。

SLAM 问题¶

形式化定义¶

Given:
  - Robot observations z_{1:t} (camera images, LiDAR scans, IMU readings)
  - Robot controls u_{1:t} (odometry, wheel encoders)

Estimate:
  - Robot trajectory x_{1:t} (where has the robot been?)
  - Map m (what does the environment look like?)

Jointly:
  p(x_{1:t}, m | z_{1:t}, u_{1:t})

为什么它很难¶

SLAM challenges:
├── Chicken-and-egg — Need location to build map, need map to localize
├── Data association — Is this the same place I visited before? (loop closure)
├── Uncertainty — Sensors are noisy, odometry drifts
├── Scalability — Maps grow with exploration time
├── Dynamic objects — People, cars, doors change the environment
└── Multi-modal — Different sensors have different strengths

带回环检测的 SLAM 演示¶

SLAM 建图与回环检测动画

SLAM 变体¶

1. 视觉 SLAM（vSLAM）¶

使用相机作为主要传感器。由于成本低且信息丰富，是最流行的方案。

方法	年份	类型	核心特点	参考文献
ORB-SLAM3	2021	基于特征（Feature-based）	单目、双目、RGB-D、IMU	Campos et al.
LSD-SLAM	2014	直接法（稠密）	从单目生成半稠密地图	Engel et al.
DSO	2017	直接法（稀疏）	光度束调整（Photometric BA）	Engel et al.
VINS-Mono	2018	基于特征 + IMU	紧耦合视觉惯性里程计	Qin et al.
OpenVSLAM	2019	基于特征	模块化架构	Sumikura et al.
DROID-SLAM	2021	深度学习	学习型 SLAM，高精度	Teed et al.
SplaTAM	2024	高斯溅射（Gaussian Splatting）	基于 3DGS 的 SLAM	Keetha et al.

视觉 SLAM 流水线¶

Camera Image
    │
    ▼
┌──────────────┐     ┌──────────────┐     ┌──────────────┐
│   Feature     │────▶│   Feature     │────▶│   Motion      │
│   Extraction  │     │   Matching    │     │   Estimation  │
│  (ORB, SIFT,  │     │  (BFMatcher,  │     │  (PnP, ICP,   │
│   SuperPoint) │     │   LightGlue)  │     │   BA)         │
└──────────────┘     └──────────────┘     └──────────────┘
                                                  │
                                                  ▼
                                          ┌──────────────┐
                                          │   Map         │
                                          │   Update      │
                                          │  (keyframes,  │
                                          │   landmarks)  │
                                          └──────────────┘
                                                  │
                                                  ▼
                                          ┌──────────────┐
                                          │   Loop        │
                                          │   Closure     │
                                          │  (detect      │
                                          │   revisits)   │
                                          └──────────────┘

基于特征的方法 vs 直接法¶

方面	基于特征（Feature-based）	直接法（Direct）
工作原理	提取关键点，在帧间匹配	直接使用像素强度
示例	ORB-SLAM3, VINS-Mono	LSD-SLAM, DSO
鲁棒性	高（对光照变化具有不变性）	较低（对曝光敏感）
地图密度	稀疏（点云）	稠密 / 半稠密
精度	良好	在纹理丰富场景中通常更好
速度	快	较慢（像素级优化）

2. 激光雷达 SLAM（LiDAR SLAM）¶

使用激光测距仪进行精确的三维地图构建。比视觉 SLAM 更精确，但成本更高。

方法	年份	类型	核心特点
LOAM	2014	基于特征	激光雷达里程计 + 建图
LeGO-LOAM	2018	基于特征	轻量级，地面优化
LIO-SAM	2020	紧耦合（Tightly-coupled）	LiDAR + IMU 因子图
FAST-LIO2	2021	迭代扩展卡尔曼滤波（Iterated EKF）	快速、轻量
CT-ICP	2021	点对点（Point-to-point）	连续时间 ICP
KISS-ICP	2023	简单 ICP	"Keep It Simple and Scalable"

3. RGB-D SLAM¶

使用深度相机（如 RealSense、Kinet）进行稠密三维重建。

方法	年份	核心特点
RTAB-Map	2014	多会话、基于图（Graph-based）
ElasticFusion	2015	实时稠密 SLAM
BundleFusion	2017	全局束调整（Global BA）
Nice-SLAM	2021	神经隐式 SLAM（Neural Implicit）
SplaTAM	2024	3D 高斯溅射 SLAM

4. 基于学习的 SLAM（Learning-Based SLAM）¶

近期趋势：用学习到的组件替代手工设计的组件。

方法	示例	年份	学习目标
学习型特征	SuperPoint + SuperGlue	2018, 2020	关键点检测 + 匹配
学习型 SLAM	DROID-SLAM	2021	端到端视觉里程计
神经隐式	iMAP, NICE-SLAM	2021	神经辐射场作为地图
高斯溅射	SplaTAM, MonoGS	2024	3DGS 作为地图表示

SLAM 数据集¶

室内数据集¶

数据集	年份	传感器	环境	核心特点
TUM RGB-D	2012	Kinect	办公室房间	39 个序列，真实值
ICL-NUIM	2014	合成（Synthetic）	客厅/办公室	完美真实值
EuRoC MAV	2016	双目 + IMU	机房、房间	微型飞行器
TartanAir	2020	双目	多种场景（仿真）	多样化环境，高清
Replica	2019	合成	室内房间	高保真三维重建
ScanNet	2017	RGB-D	1513 个场景	语义标签

室外数据集¶

数据集	年份	传感器	环境	核心特点
KITTI	2012	双目 + 激光雷达	城市驾驶	标准基准测试
nuScenes	2019	激光雷达 + 相机	城市，波士顿/新加坡	1000 个场景，3D 标注
Waymo Open	2019	激光雷达 + 相机	城市/郊区	1150 个场景
MulRan	2020	激光雷达	城市，多会话	长期重定位
Oxford RobotCar	2016	多传感器	牛津城市	1000+ 公里，多天气
Hilti SLAM Challenge	2022	多传感器	建筑工地	多楼层 SLAM

数据集详情¶

TUM RGB-D（标准室内基准测试）¶

TUM RGB-D Dataset:
├── 39 sequences across 5 scenarios
│   ├── fr1_xyz        — Slow, structured motion
│   ├── fr1_desk       — Desktop objects
│   ├── fr2_xyz        — Larger workspace
│   ├── fr3_office     — Full office
│   └── fr1_room       — Complete room traversal
├── Sensor: Microsoft Kinect v1
├── Resolution: 640×480 @ 30Hz
├── Ground truth: Motion capture system
└── Evaluation: ATE (Absolute Trajectory Error)

KITTI（标准室外基准测试）¶

KITTI Dataset:
├── Stereo + Velodyne LiDAR + GPS/IMU
├── Urban, suburban, highway scenarios
├── Sequences: 22 training, 11 test
├── Ground truth: GPS/RTK (cm-level)
├── Evaluation:
│   ├── t_err — Translational error (%)
│   └── r_err — Rotational error (deg/100m)
└── Odometry benchmark leaderboard

评估指标¶

绝对轨迹误差（ATE, Absolute Trajectory Error）¶

衡量估计轨迹的**全局一致性**。

\[ \text{ATE} = \sqrt{\frac{1}{N} \sum_{i=1}^{N} \| \hat{t}_i - t_i \|^2} \]

其中 \(\hat{t}_i\) 是估计位置，\(t_i\) 是真实值。

相对位姿误差（RPE, Relative Pose Error）¶

在固定时间间隔上衡量**局部精度**。

\[ \text{RPE} = \sqrt{\frac{1}{N-\Delta} \sum_{i=1}^{N-\Delta} \| (\hat{T}_i^{-1} \hat{T}_{i+\Delta})^{-1} (T_i^{-1} T_{i+\Delta}) \|^2} \]

对比表¶

指标	衡量内容	敏感因素	应用场景
ATE	全局一致性	尺度、旋转、平移	回环检测质量
RPE	局部精度	短时间间隔内的漂移	里程计质量
地图质量	三维重建	完整性、精度	地图构建应用

评估代码¶

import numpy as np
from scipy.spatial.transform import Rotation

def compute_ate(estimated_poses, ground_truth_poses):
    """
    Compute Absolute Trajectory Error (ATE).

    Args:
        estimated_poses: List of 4x4 SE(3) matrices
        ground_truth_poses: List of 4x4 SE(3) matrices

    Returns:
        ate: Root mean squared error (meters)
    """
    errors = []
    for T_est, T_gt in zip(estimated_poses, ground_truth_poses):
        # Translation error
        t_est = T_est[:3, 3]
        t_gt = T_gt[:3, 3]
        error = np.linalg.norm(t_est - t_gt)
        errors.append(error)

    ate = np.sqrt(np.mean(np.array(errors)**2))
    return ate

def compute_rpe(estimated_poses, ground_truth_poses, delta=1):
    """
    Compute Relative Pose Error (RPE).

    Args:
        estimated_poses: List of 4x4 SE(3) matrices
        ground_truth_poses: List of 4x4 SE(3) matrices
        delta: Frame interval for comparison

    Returns:
        trans_err: Mean translational error (meters)
        rot_err: Mean rotational error (degrees)
    """
    trans_errors = []
    rot_errors = []

    for i in range(len(estimated_poses) - delta):
        # Relative poses
        T_est_rel = np.linalg.inv(estimated_poses[i]) @ estimated_poses[i + delta]
        T_gt_rel = np.linalg.inv(ground_truth_poses[i]) @ ground_truth_poses[i + delta]

        # Error
        T_err = np.linalg.inv(T_gt_rel) @ T_est_rel

        # Translation error
        trans_errors.append(np.linalg.norm(T_err[:3, 3]))

        # Rotation error (angle of rotation)
        r = Rotation.from_matrix(T_err[:3, :3])
        rot_errors.append(np.abs(r.as_rotvec(degrees=True)).max())

    return np.mean(trans_errors), np.mean(rot_errors)

现代趋势（2023–2025）¶

神经隐式 SLAM（Neural Implicit SLAM）¶

使用**神经辐射场（NeRF, Neural Radiance Fields）**或 **3D 高斯溅射（3DGS, 3D Gaussian Splatting）**替代传统地图。

Traditional SLAM:     Map = sparse points + keyframes
Neural SLAM:          Map = neural network (implicit function)
Gaussian Splatting:   Map = 3D Gaussian primitives

Advantages:
- Dense, photorealistic maps
- Novel view synthesis
- Compact representation

Challenges:
- Computationally expensive
- Real-time performance is hard
- Loop closure in neural maps

基础模型用于 SLAM（Foundation Models for SLAM）¶

方法	示例	作用
学习型特征	SuperPoint, LightGlue	在挑战性条件下实现更好的匹配
语义 SLAM（Semantic SLAM）	ConceptFusion	构建语义地图
语言驱动 SLAM（Language-grounded）	NLMap	"找到红色沙发附近的物体"
深度估计（Depth estimation）	DPT, Depth Anything	单目深度用于 SLAM

参考资料¶

Cadena et al. (2016). "Past, Present, and Future of Simultaneous Localization and Mapping: Toward the Robust-Perception Age." IEEE T-RO
Campos et al. (2021). "ORB-SLAM3: An Accurate Open-Source Library for Visual, Visual-Inertial and Multi-Map SLAM." IEEE T-RO
Qin et al. (2018). "VINS-Mono: A Robust and Versatile Monocular Visual-Inertial State Estimator." IEEE T-RO
Shan et al. (2021). "LIO-SAM: Tightly-coupled Lidar Inertial Odometry via Smoothing and Mapping." IROS 2020
Teed & Deng (2021). "DROID-SLAM: Deep Visual SLAM for Monocular, Stereo, and RGB-D Cameras." NeurIPS 2021
Keetha et al. (2024). "Splat-SLAM: Globally Optimized RGB-only SLAM with 3D Gaussians." CVPR 2024
Sturm et al. (2012). "A Benchmark for the Evaluation of RGB-D SLAM Systems." IROS 2012
Geiger et al. (2012). "Are we ready for autonomous driving? The KITTI vision benchmark suite." CVPR 2012