深度估计¶

深度感知（Depth Perception）——即测量相机到场景中每个点的距离的能力——是机器人最基础的感知能力之一。没有深度信息，机器人看到的世界只是一张平面照片，无法推理环境的三维结构。有了深度信息，机器人可以绕过障碍物导航、规划操作轨迹、构建环境地图。

本教程涵盖四大类深度感知技术：立体视觉（Stereo Vision）、结构光（Structured Light）/ RGB-D 相机、飞行时间（Time-of-Flight, ToF） 和 单目深度估计（Monocular Depth Estimation）。我们提供可运行的 Python 代码、数学推导和对比分析，帮助你为机器人选择合适的传感器。

学习目标¶

完成本教程后，你将能够：

理解不同深度感知技术的工作原理
比较立体视觉、结构光、ToF 和单目方法在距离范围、精度、成本和工作条件方面的差异
使用 OpenCV 实现立体深度估计（块匹配和半全局匹配）
利用相机内参从深度图像生成点云（Point Cloud）
使用现代单目深度估计模型（MiDaS、Depth Anything）
将深度相机集成到 ROS 2 中

1. 为什么深度在机器人中很重要¶

标准 RGB 相机捕捉的是三维世界的 二维投影 —— 所有深度信息都丢失了。然而几乎每个机器人任务都需要某种形式的深度推理：

任务	为什么需要深度
导航（Navigation）	检测并避开不同距离的障碍物
操作（Manipulation）	计算物体的三维位置以进行抓取规划
建图（Mapping）	构建三维占用地图或点云地图
检测（Inspection）	测量尺寸、检测表面缺陷
人机交互（Human-Robot Interaction）	在三维空间中跟踪人体姿态以确保安全协作

2D 与 3D 感知¶

   2D 图像 (RGB)                  3D 感知 (RGB-D)
  +------------------+            +------------------+
  |  .  .  .  .  .  |            |  .  .  .  .  .  |
  |  .  BOX  .  .   |            |  . BOX(1.2m) .   |
  |  .  .  .  .  .  |     vs     |  .  .  .  .  .   |
  |  ROBOT  .  .    |            |  ROBOT  .  .     |
  |  .  .  .  .  .  |            |  .  .  .  .  .   |
  +------------------+            +------------------+
  "有一个箱子。"                   "箱子在 (0.3, 0, 1.2) 米处"

深度将感知转变为测量。

2. 立体视觉¶

立体视觉（Stereo Vision）使用 两个相机，通过已知的基线距离 $B$ 和三角测量（Triangulation）来估计深度。这种方法模仿了人类的双目视觉。

2.1 极几何¶

给定从不同视角拍摄的同一场景的两张图像，它们之间的几何关系由两个矩阵来描述。

基本矩阵（Fundamental Matrix） $F$：在像素坐标中关联对应点 $\mathbf{x}$ 和 $\mathbf{x'}$：

\[\mathbf{x'}^T F \mathbf{x} = 0\]

其中 $F$ 是一个秩为 2 的 $3 \times 3$ 矩阵，编码了极几何关系。

本质矩阵（Essential Matrix） $E$：在标定后的（归一化）坐标中的相同关系：

\[\mathbf{x'}_n^T E \mathbf{x}_n = 0\]

本质矩阵编码了两个相机之间的旋转 $R$ 和平移 $t$：

\[E = [t]_\times R\]

其中 $[t]_\times$ 是平移向量的反对称矩阵（Skew-symmetric Matrix）：

\[[t]_\times = \begin{pmatrix} 0 & -t_z & t_y \\ t_z & 0 & -t_x \\ -t_y & t_x & 0 \end{pmatrix}\]

$F$ 和 $E$ 通过相机内参 $K$ 相关联：

\[E = K'^T F K\]

极线约束（Epipolar Constraint）：对于左图像中的任意点，其在右图像中的对应点必须位于一条特定的直线上——极线（Epipolar Line）。这将对应点的搜索从二维降低到了一维。

  左图像               右图像
  +--------+          +--------+
  |   x    |          |        |
  |        |  ----->  |极线    |
  |        |          |  x'    |
  +--------+          +--------+
  点 x 对应             x' 必须在
  某个点               极线上

2.2 立体校正¶

在计算视差之前，我们需要对两幅图像进行 校正（Rectification），使极线变成水平扫描线。这将对应点搜索简化为一维水平搜索。

给定两个相机的内参和相对位姿，cv2.stereoRectify 计算旋转矩阵 $R_1, R_2$ 和新的投影矩阵 $P_1, P_2$，将两幅图像对齐。

  校正前                      校正后
  +--------+  +--------+     +--------+  +--------+
  | /      |  |  \     |     | -------|  |--------|
  |   x    |  |     x' | --> |   x    |  |   x'   |
  |  /     |  |   \    |     | -------|  |--------|
  +--------+  +--------+     +--------+  +--------+
  极线在两幅图像中            极线变成
  都是倾斜的                 水平扫描线

2.3 块匹配与半全局匹配¶

块匹配（Block Matching, BM） 通过比较每个像素周围的小窗口（块）来寻找对应点。速度快但结果可能有噪声。

半全局匹配（Semi-Global Matching, SGM） 在 BM 的基础上沿多个扫描线方向施加平滑约束。计算量更大，但生成的视差图质量高得多。

import cv2
import numpy as np

# 加载校正后的立体图像对（灰度图）
img_left = cv2.imread("left.png", cv2.IMREAD_GRAYSCALE)
img_right = cv2.imread("right.png", cv2.IMREAD_GRAYSCALE)

# --- 块匹配 (Block Matching) ---
stereo_bm = cv2.StereoBM_create(
    numDisparities=128,   # 必须能被 16 整除
    blockSize=15          # 奇数，通常 5-21
)
disparity_bm = stereo_bm.compute(img_left, img_right)

# --- 半全局匹配 (Semi-Global Matching) ---
stereo_sgbm = cv2.StereoSGBM_create(
    minDisparity=0,
    numDisparities=128,       # 必须能被 16 整除
    blockSize=5,
    P1=8 * 3 * 5 ** 2,       # 视差变化 1 的惩罚
    P2=32 * 3 * 5 ** 2,      # 视差变化 > 1 的惩罚
    disp12MaxDiff=1,
    uniquenessRatio=10,
    speckleWindowSize=100,
    speckleRange=32,
    preFilterCap=63,
    mode=cv2.STEREO_SGBM_MODE_SGBM_3WAY
)
disparity_sgbm = stereo_sgbm.compute(img_left, img_right)

# 归一化用于可视化（16位有符号 -> 8位无符号）
disp_vis = cv2.normalize(
    disparity_sgbm, None, alpha=0, beta=255,
    norm_type=cv2.NORM_MINMAX, dtype=cv2.CV_8U
)
cv2.imwrite("disparity.png", disp_vis)
print("视差图已保存。形状:", disparity_sgbm.shape)

2.4 从视差计算深度¶

得到视差图后，使用以下公式即可直接计算深度：

\[Z = \frac{f \cdot B}{d}\]

其中：

$Z$ 是深度（距离），单位为米
$f$ 是焦距，单位为像素
$B$ 是基线距离（两个相机之间的距离），单位为米
$d$ 是视差（水平像素偏移），单位为像素

import cv2
import numpy as np

def compute_depth_map(left_path, right_path, focal_length, baseline,
                      num_disparities=128, block_size=5):
    """
    从校正后的立体图像对计算深度图。

    参数
    ----
    left_path : str       左图像路径（校正后）
    right_path : str      右图像路径（校正后）
    focal_length : float  焦距，单位为像素
    baseline : float      立体基线距离，单位为米

    返回
    ----
    depth_map : ndarray   深度图，单位为米 (float32)
    """
    # 加载图像
    img_left = cv2.imread(left_path, cv2.IMREAD_GRAYSCALE)
    img_right = cv2.imread(right_path, cv2.IMREAD_GRAYSCALE)

    # 使用 SGM 计算视差
    stereo = cv2.StereoSGBM_create(
        minDisparity=0,
        numDisparities=num_disparities,
        blockSize=block_size,
        P1=8 * 3 * block_size ** 2,
        P2=32 * 3 * block_size ** 2,
        disp12MaxDiff=1,
        uniquenessRatio=10,
        speckleWindowSize=100,
        speckleRange=32,
        mode=cv2.STEREO_SGBM_MODE_SGBM_3WAY
    )
    disparity = stereo.compute(img_left, img_right).astype(np.float32) / 16.0

    # 避免除零
    disparity[disparity <= 0] = 0.1

    # Z = f * B / d
    depth_map = (focal_length * baseline) / disparity

    # 限制最大深度为 10 米
    depth_map[depth_map > 10.0] = 10.0

    return depth_map


# 使用示例
if __name__ == "__main__":
    focal_length = 718.856   # 示例：KITTI 数据集
    baseline = 0.54          # 米

    depth = compute_depth_map("left.png", "right.png", focal_length, baseline)
    print(f"深度图形状: {depth.shape}")
    print(f"深度范围: {depth.min():.2f} - {depth.max():.2f} 米")

    # 可视化
    depth_vis = cv2.normalize(depth, None, 0, 255, cv2.NORM_MINMAX)
    cv2.imwrite("depth_map.png", depth_vis.astype(np.uint8))

3. 结构光（RGB-D 相机）¶

结构光相机向场景投射已知的图案（红外点阵或条纹），并观察图案在物体表面上如何变形。通过分析变形情况，相机计算每个像素的深度。

3.1 Kinect / RealSense / Azure Kinect¶

机器人领域最流行的结构光和 RGB-D 传感器：

传感器	技术	深度分辨率	距离范围	帧率	价格 (USD)	备注
Kinect v1	结构光 (IR)	640×480	0.5–4.5 m	30	已停产	经典设备，社区庞大
Kinect v2	ToF	512×424	0.5–4.5 m	30	已停产	精度更高
Intel RealSense D435i	主动红外立体	1280×720	0.2–10 m	90	~$250	内置 IMU，ROS 2 支持好
Intel RealSense D455	主动红外立体	1280×720	0.4–6 m	90	~$350	基线更长，精度更好
Azure Kinect DK	ToF (iTOF)	1024×1024	0.5–5.46 m	30	~$400	人体姿态追踪 SDK
Orbbec Femto Mega	ToF	1024×1024	0.2–5 m	30	~$350	兼容 OpenNI

ROS 2 项目推荐

对于基于 ROS 2 的机器人项目，Intel RealSense D435i 是最受欢迎的选择，因为它有原生的 realsense2_camera ROS 2 驱动、内置 IMU，且在距离范围和价格之间取得了良好的平衡。

3.2 深度图像处理¶

Intel 提供了 pyrealsense2 库，可以直接访问 RealSense 相机：

import pyrealsense2 as rs
import numpy as np
import cv2

def stream_realsense():
    """从 RealSense 相机流式获取并对齐彩色图像和深度图像。"""

    # 配置流
    pipeline = rs.pipeline()
    config = rs.config()
    config.enable_stream(rs.stream.depth, 640, 480, rs.format.z16, 30)
    config.enable_stream(rs.stream.color, 640, 480, rs.format.bgr8, 30)

    # 开始流式传输
    profile = pipeline.start(config)

    # 获取深度比例（每单位对应多少米）
    depth_sensor = profile.get_device().first_depth_sensor()
    depth_scale = depth_sensor.get_depth_scale()
    print(f"深度比例: {depth_scale} 米/单位")

    # 将深度对齐到彩色图像
    align_to = rs.stream.color
    align = rs.align(align_to)

    try:
        while True:
            # 等待帧
            frames = pipeline.wait_for_frames()

            # 将深度帧对齐到彩色帧
            aligned_frames = align.process(frames)
            depth_frame = aligned_frames.get_depth_frame()
            color_frame = aligned_frames.get_color_frame()

            if not depth_frame or not color_frame:
                continue

            # 转换为 numpy 数组
            depth_image = np.asanyarray(depth_frame.get_data())
            color_image = np.asanyarray(color_frame.get_data())

            # 深度（米）
            depth_meters = depth_image * depth_scale

            # 为深度图应用伪彩色以便可视化
            depth_colormap = cv2.applyColorMap(
                cv2.convertScaleAbs(depth_image, alpha=0.03),
                cv2.COLORMAP_JET
            )

            # 水平拼接彩色图和深度图
            images = np.hstack((color_image, depth_colormap))

            cv2.imshow("彩色 + 深度", images)
            if cv2.waitKey(1) & 0xFF == ord('q'):
                break

    finally:
        pipeline.stop()
        cv2.destroyAllWindows()


if __name__ == "__main__":
    stream_realsense()

4. 飞行时间（ToF）¶

飞行时间（Time-of-Flight）相机主动发射调制的红外光，并测量光的 往返时间（或相位偏移）来计算深度。

原理¶

ToF 有两种主要方法：

直接飞行时间（Direct ToF, dToF）：测量光脉冲到达表面并返回的实际时间。

\[d = \frac{c \cdot \Delta t}{2}\]

其中 $c$ 是光速，$\Delta t$ 是往返时间。

间接飞行时间（Indirect ToF, iToF）：发射连续调制光，测量返回信号的 相位偏移 $\phi$：

\[d = \frac{c \cdot \phi}{4 \pi f_{mod}}\]

其中 $f_{mod}$ 是调制频率。

优势与局限¶

方面	说明
✅ 优势	全帧率密集深度；无需基线；在低纹理场景中工作良好
✅ 紧凑	单传感器，体积小
❌ 多路径干扰	反射/半透明表面会导致误差
❌ 距离有限	消费级传感器通常 0.5–5 m
❌ 户外受限	强烈阳光会饱和红外传感器
❌ 分辨率低	通常低于立体视觉或结构光

5. 单目深度估计¶

近年来，深度学习方法可以从 单张 RGB 图像 估计深度。虽然过去这些方法只能产生相对（尺度不变的）深度，但现代模型已经能够实现度量深度估计。

5.1 MiDaS / Depth Anything¶

MiDaS（Ranftl 等，2020）和 Depth Anything（Yang 等，2024）是当前最先进的单目深度估计模型，可通过 torch.hub 轻松使用。

import torch
import cv2
import numpy as np
import matplotlib.pyplot as plt


def estimate_depth_midas(image_path):
    """
    使用 MiDaS v3.1 (DPT-Large) 从单张图像估计深度。

    参数
    ----
    image_path : str   输入 RGB 图像的路径

    返回
    ----
    depth : ndarray    相对深度图 (float32，值越大越近)
    """
    # 从 torch.hub 加载 MiDaS 模型
    model_type = "DPT_Large"   # 可选: DPT_Large, DPT_Hybrid, MiDaS_small
    midas = torch.hub.load("intel-isl/MiDaS", model_type)

    # 如果有 GPU 则使用 GPU
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    midas.to(device)
    midas.eval()

    # 加载预处理变换
    midas_transforms = torch.hub.load("intel-isl/MiDaS", "transforms")
    if model_type in ["DPT_Large", "DPT_Hybrid"]:
        transform = midas_transforms.dpt_transform
    else:
        transform = midas_transforms.small_transform

    # 加载并预处理图像
    img = cv2.imread(image_path)
    img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    input_batch = transform(img_rgb).to(device)

    # 推理
    with torch.no_grad():
        prediction = midas(input_batch)

        # 调整回原始分辨率
        prediction = torch.nn.functional.interpolate(
            prediction.unsqueeze(1),
            size=img_rgb.shape[:2],
            mode="bicubic",
            align_corners=False,
        ).squeeze()

    depth = prediction.cpu().numpy()
    return depth


def estimate_depth_anything(image_path):
    """
    使用 Depth Anything V2 (small 模型) 估计深度。

    参数
    ----
    image_path : str   输入 RGB 图像的路径

    返回
    ----
    depth : ndarray    相对深度图 (float32)
    """
    # 从 torch.hub 加载 Depth Anything
    model = torch.hub.load("depth-anything/Depth-Anything-V2", "DepthAnythingV2",
                           encoder="vits", device="cpu")
    model.eval()

    # 加载图像
    img = cv2.imread(image_path)
    img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

    # 推理（模型内部处理预处理）
    with torch.no_grad():
        depth = model.infer_image(img_rgb)

    return depth


if __name__ == "__main__":
    image_path = "test_image.jpg"

    # MiDaS
    depth_midas = estimate_depth_midas(image_path)
    print(f"MiDaS 深度图形状: {depth_midas.shape}")
    print(f"MiDaS 深度范围: {depth_midas.min():.2f} - {depth_midas.max():.2f}")

    # 可视化
    plt.figure(figsize=(12, 5))
    plt.subplot(1, 2, 1)
    plt.imshow(cv2.cvtColor(cv2.imread(image_path), cv2.COLOR_BGR2RGB))
    plt.title("输入图像")
    plt.axis("off")

    plt.subplot(1, 2, 2)
    plt.imshow(depth_midas, cmap="plasma")
    plt.title("MiDaS 深度（相对值）")
    plt.colorbar(fraction=0.046)
    plt.axis("off")

    plt.tight_layout()
    plt.savefig("depth_estimation_result.png", dpi=150)
    plt.show()

5.2 度量深度估计¶

MiDaS 和 Depth Anything 产生的 相对深度 —— 数值不是以米为单位的。对于机器人应用，我们通常需要 度量深度（Metric Depth），即以米为单位的绝对深度。现代方法包括：

模型	度量？	距离范围	备注
MiDaS v3.1	相对	N/A	尺度不变
Depth Anything V2	相对	N/A	可配合度量头使用
ZoeDepth	度量（室内）	0–10 m	结合 MiDaS 与度量分箱
UniDepth	度量	0.1–100 m	与相机无关的度量深度
Depth Pro (Apple)	度量	0.1–250 m	边界清晰，零样本
Metric3D v2	度量	0–100 m	规范相机变换

如果你有少量真实深度值（Ground Truth），可以用简单的仿射模型将相对深度转换为度量深度：

import numpy as np

def relative_to_metric(rel_depth, gt_depth, mask=None):
    """
    使用仿射对齐将相对深度转换为度量深度。

    参数
    ----
    rel_depth : ndarray   相对深度图 (H, W)
    gt_depth  : ndarray   真实度量深度 (H, W)，0 表示无效
    mask      : ndarray   有效像素的布尔掩码

    返回
    ----
    metric_depth : ndarray   度量深度，单位为米 (H, W)
    """
    if mask is None:
        mask = gt_depth > 0

    # 展平有效像素
    rel_vals = rel_depth[mask].reshape(-1, 1)
    gt_vals = gt_depth[mask].reshape(-1, 1)

    # 求解: gt = a * rel + b（最小二乘）
    A = np.hstack([rel_vals, np.ones_like(rel_vals)])
    result = np.linalg.lstsq(A, gt_vals, rcond=None)
    a, b = result[0].flatten()

    # 应用仿射变换
    metric_depth = a * rel_depth + b
    metric_depth = np.clip(metric_depth, 0, None)

    print(f"仿射拟合: scale={a:.4f}, shift={b:.4f}")
    return metric_depth

6. 深度传感器对比¶

指标	立体视觉	结构光	ToF	单目 (深度学习)
距离范围	0.5 m – ∞	0.2 – 10 m	0.5 – 5 m	0.1 – 250 m*
精度	中等	高	中–高	低–中
分辨率	高（与相机一致）	640×480 常见	320×240 – 1024×1024	与输入图像一致
帧率	30–60 fps	30 fps	30–60 fps	1–30 fps（取决于 GPU）
成本	低（两个相机）	中等 ($200–400)	中–高 ($300–500)	免费（仅需软件）
户外	✅ 表现好	❌ 红外图案被干扰	❌ 阳光饱和红外	✅ 表现好
纹理依赖	❌ 需要纹理	✅ 主动投射图案	✅ 主动照明	✅ 从数据中学习
功耗	低	中等（红外投影器）	中等	高（GPU 推理）
延迟	~30 ms	~30 ms	~30 ms	50–500 ms
紧凑性	❌ 需要基线	✅ 体积小	✅ 体积小	✅ 单相机

*基于 Depth Pro 等度量深度模型。

7. 从深度图生成点云¶

深度图像加上相机内参（Intrinsic Matrix）即可生成 三维点云（3D Point Cloud）。每个像素 $(u, v)$ 及其深度 $Z$ 映射为一个三维点：

\[X = \frac{(u - c_x) \cdot Z}{f_x}, \quad Y = \frac{(v - c_y) \cdot Z}{f_y}\]

其中 $f_x, f_y$ 为焦距，$c_x, c_y$ 为主点坐标（Principal Point）。

import numpy as np
import cv2

def depth_to_point_cloud(depth_image, intrinsic_matrix, color_image=None,
                         depth_scale=1.0, stride=1):
    """
    将深度图像转换为三维点云。

    参数
    ----
    depth_image : ndarray      深度图像 (H, W)，uint16 或 float32
    intrinsic_matrix : ndarray  3x3 相机内参矩阵 K
    color_image : ndarray       可选的 RGB 图像 (H, W, 3)
    depth_scale : float         深度值转换为米的比例因子
    stride : int                降采样因子（1 = 每个像素）

    返回
    ----
    points : ndarray    (N, 3) XYZ 坐标，单位为米
    colors : ndarray    (N, 3) RGB 值 (0-255)，或 None
    """
    fx = intrinsic_matrix[0, 0]
    fy = intrinsic_matrix[1, 1]
    cx = intrinsic_matrix[0, 2]
    cy = intrinsic_matrix[1, 2]

    # 创建像素坐标网格
    h, w = depth_image.shape
    u, v = np.meshgrid(np.arange(0, w, stride),
                       np.arange(0, h, stride))

    # 获取网格点处的深度值
    depth = depth_image[::stride, ::stride].astype(np.float32) * depth_scale

    # 掩盖无效深度（0 或负值）
    valid = depth > 0
    z = depth[valid]
    x = (u[valid] - cx) * z / fx
    y = (v[valid] - cy) * z / fy

    points = np.stack([x, y, z], axis=-1)

    # 如果提供了颜色信息则提取颜色
    colors = None
    if color_image is not None:
        colors = color_image[::stride, ::stride][valid]

    return points, colors


def save_point_cloud_ply(filename, points, colors=None):
    """
    将点云保存为 PLY 文件格式。

    参数
    ----
    filename : str      输出 PLY 文件路径
    points : ndarray    (N, 3) XYZ 坐标
    colors : ndarray    (N, 3) RGB 值（可选）
    """
    n_points = points.shape[0]
    has_color = colors is not None

    with open(filename, 'w') as f:
        # PLY 头
        f.write("ply\n")
        f.write("format ascii 1.0\n")
        f.write(f"element vertex {n_points}\n")
        f.write("property float x\n")
        f.write("property float y\n")
        f.write("property float z\n")
        if has_color:
            f.write("property uchar red\n")
            f.write("property uchar green\n")
            f.write("property uchar blue\n")
        f.write("end_header\n")

        # 写入点
        for i in range(n_points):
            line = f"{points[i, 0]:.4f} {points[i, 1]:.4f} {points[i, 2]:.4f}"
            if has_color:
                line += f" {int(colors[i, 0])} {int(colors[i, 1])} {int(colors[i, 2])}"
            f.write(line + "\n")

    print(f"已将 {n_points} 个点保存到 {filename}")


if __name__ == "__main__":
    # 示例：已知内参的合成深度图像
    H, W = 480, 640
    depth_img = np.random.randint(500, 5000, (H, W)).astype(np.uint16)

    K = np.array([
        [525.0,   0.0, 319.5],
        [  0.0, 525.0, 239.5],
        [  0.0,   0.0,   1.0]
    ])

    points, colors = depth_to_point_cloud(
        depth_img, K, depth_scale=0.001, stride=4
    )
    print(f"生成了 {points.shape[0]} 个三维点")
    print(f"X 范围: [{points[:, 0].min():.2f}, {points[:, 0].max():.2f}] 米")
    print(f"Y 范围: [{points[:, 1].min():.2f}, {points[:, 1].max():.2f}] 米")
    print(f"Z 范围: [{points[:, 2].min():.2f}, {points[:, 2].max():.2f}] 米")

    save_point_cloud_ply("output.ply", points)

8. ROS 2 集成¶

深度相机在 ROS 2 中通过标准化的主题（Topic）发布数据，使用标准消息类型。

常用主题¶

主题	消息类型	描述
`/camera/color/image_raw`	`sensor_msgs/Image`	RGB 彩色图像
`/camera/depth/image_rect_raw`	`sensor_msgs/Image`	深度图像（16位，毫米或原始值）
`/camera/depth/color/points`	`sensor_msgs/PointCloud2`	彩色点云
`/camera/aligned_depth_to_color/image_raw`	`sensor_msgs/Image`	对齐到彩色帧的深度图
`/camera/color/camera_info`	`sensor_msgs/CameraInfo`	相机内参（K 矩阵、畸变参数）
`/camera/imu`	`sensor_msgs/Imu`	IMU 数据（仅 D435i）

在 ROS 2 中启动 RealSense¶

# 安装 RealSense ROS 2 驱动
sudo apt install ros-${ROS_DISTRO}-realsense2-camera

# 使用默认设置启动
ros2 launch realsense2_camera rs_launch.py

# 使用指定参数启动
ros2 launch realsense2_camera rs_launch.py \
    depth_module.depth_profile:=640x480x30 \
    rgb_camera.color_profile:=640x480x30 \
    align_depth.enable:=true \
    pointcloud.enable:=true

在 Python 中订阅深度数据（ROS 2）¶

import rclpy
from rclpy.node import Node
from sensor_msgs.msg import Image, CameraInfo
from cv_bridge import CvBridge
import numpy as np


class DepthSubscriber(Node):
    """订阅 RealSense 相机的深度图像和相机信息。"""

    def __init__(self):
        super().__init__("depth_subscriber")
        self.bridge = CvBridge()
        self.K = None  # 相机内参矩阵

        # 订阅相机信息（一次即可）
        self.create_subscription(
            CameraInfo,
            "/camera/aligned_depth_to_color/camera_info",
            self.camera_info_callback,
            10
        )

        # 订阅对齐后的深度图像
        self.create_subscription(
            Image,
            "/camera/aligned_depth_to_color/image_raw",
            self.depth_callback,
            10
        )

    def camera_info_callback(self, msg):
        """从 CameraInfo 中提取相机内参矩阵。"""
        self.K = np.array(msg.k).reshape(3, 3)
        self.get_logger().info(f"相机内参:\n{self.K}")

    def depth_callback(self, msg):
        """处理接收到的深度图像。"""
        # 将 ROS Image 转换为 OpenCV 格式
        depth_image = self.bridge.imgmsg_to_cv2(msg, desired_encoding="passthrough")

        # 深度值以毫米为单位 (uint16) —— 转换为米
        depth_meters = depth_image.astype(np.float32) / 1000.0

        # 打印统计信息
        valid = depth_meters > 0
        if valid.any():
            self.get_logger().info(
                f"深度 — 最小: {depth_meters[valid].min():.2f} m, "
                f"最大: {depth_meters[valid].max():.2f} m, "
                f"中位数: {np.median(depth_meters[valid]):.2f} m"
            )


def main():
    rclpy.init()
    node = DepthSubscriber()
    rclpy.spin(node)
    node.destroy_node()
    rclpy.shutdown()


if __name__ == "__main__":
    main()

常用 ROS 2 深度相机命令¶

# 列出所有活动主题
ros2 topic list | grep camera

# 在 RViz2 中查看深度图像
rviz2
# Add -> By topic -> /camera/depth/image_rect_raw

# 打印相机信息
ros2 topic echo /camera/color/camera_info

# 检查帧率
ros2 topic hz /camera/aligned_depth_to_color/image_raw

# 录制包含深度数据的 rosbag
ros2 bag record /camera/color/image_raw \
                /camera/aligned_depth_to_color/image_raw \
                /camera/color/camera_info

练习¶

练习 1：KITTI 立体深度估计¶

从 KITTI 数据集下载一对立体图像。使用 cv2.StereoSGBM 计算视差图并转换为深度图。以伪彩色图的形式可视化结果。

提示

KITTI 使用 focal_length = 718.856，baseline = 0.54 m。使用 cv2.imread 以灰度模式加载图像。

练习 2：RealSense 点云生成¶

连接 Intel RealSense D435i，使用 pyrealsense2 捕获深度帧。生成彩色点云并使用 Open3D 进行可视化。

import open3d as o3d

# 根据上述方法生成 points 和 colors 后：
pcd = o3d.geometry.PointCloud()
pcd.points = o3d.utility.Vector3dVector(points)
pcd.colors = o3d.utility.Vector3dVector(colors / 255.0)
o3d.visualization.draw_geometries([pcd])

练习 3：单目深度估计对比¶

下载一张图像，分别运行 MiDaS 和 Depth Anything。对比两个深度图的视觉效果。是否存在其中一个模型明显优于另一个的区域？讨论原因。

练习 4：基于深度的障碍物检测¶

编写一个 ROS 2 节点，订阅深度相机主题，将距离小于 0.5 米的像素进行分割，当检测到障碍物时发布警告消息。使用 bag 文件进行测试。

参考资料¶

Multiple View Geometry in Computer Vision, Hartley & Zisserman — 极几何与立体视觉的权威教材
OpenCV Stereo Vision Documentation — StereoBM 和 StereoSGBM 教程
MiDaS: Robust Monocular Depth Estimation, Ranftl et al. 2020 — 单目深度估计模型
Depth Anything V2, Yang et al. 2024 — 最先进的单目深度估计
Intel RealSense SDK 2.0 Documentation — RealSense 官方编程指南
realsense-ros (ROS 2 wrapper) — RealSense 相机的 ROS 2 驱动
ZoeDepth: Zero-shot Metric Depth, Bhat et al. 2023 — 单张图像的度量深度估计
Depth Pro: Sharp Monocular Metric Depth, Apple 2024 — 高质量度量深度
KITTI Stereo Dataset — 包含真实深度的基准立体数据集
ROS 2 sensor_msgs — Image — 标准图像消息类型