机械臂抓取 (Robotic Arm Grasping)¶

项目类型: 机器人操作 | 难度: ★★☆☆☆ 到 ★★★★★ (取决于算法层级) | 预计时间: 2–4 个周末

1. 项目概述¶

抓取（Grasping） 是机器人操作的基础技能：机器人必须将末端执行器（夹爪）定位到特定的 6-DoF 姿态，闭合手指，并在不掉落物体或不与环境碰撞的情况下提起物体。

本项目涵盖三个算法层级：

层级	方法	输入	核心算法
层级 1 — 传统	解析 + 数值逆运动学	已知目标姿态	几何 IK、雅可比伪逆、DLS
层级 2 — 中级	GraspNet 风格深度学习	RGB-D / 点云	6-DoF 抓取候选采样 + CNN 质量评分
层级 3 — 现代	强化学习抓取	原始传感器	Isaac Gym 或 PyBullet 中的 PPO/SAC

  ┌──────────────────────────────────────────────────────────────┐
  │                    机械臂抓取流程                            │
  │                                                              │
  │   ┌──────────────┐   ┌──────────────┐   ┌───────────────┐    │
  │   │  感知         │──▶│  抓取         │──▶│  执行          │    │
  │   │  (相机 /       │   │  规划         │   │  (移动至       │    │
  │   │   深度相机)    │   │  (IK / DL /  │   │   姿态、合爪)   │    │
  │   │               │   │   RL)         │   │               │    │
  │   └──────────────┘   └──────────────┘   └───────────────┘    │
  │         │                  │                   │             │
  │         ▼                  ▼                   ▼             │
  │   点云 / RGB-D        抓取姿态           成功?                │
  │                        (x,y,z,r,p,y)       反馈 → 循环        │
  │                        + 夹爪开度                          │
  └──────────────────────────────────────────────────────────────┘

2. 硬件与软件需求¶

硬件¶

组件	规格	备注
机械臂	UR5e / Panda 7-DoF	最低 6-DoF；需要 URDF
夹爪	并行双指夹爪	如 Robotiq 2F-85，或自定义
相机	Intel RealSense D435 / Azure Kinect	层级 2 及以上需要 RGB-D
计算平台	Ubuntu 20.04，GPU ≥ 6 GB 显存	用于深度学习 / 强化学习训练
物体套装	YCB / GraspNet-1B 物体	各种形状和尺寸
（可选）	ATI 力/力矩传感器	用于抓取质量评估

软件¶

包	版本	用途
Python	≥ 3.8	核心语言
NumPy / SciPy	≥ 1.20	数值 IK
MoveIt	ROS Noetic	运动规划 + IK
PyBullet	≥ 3.0	物理仿真（层级 3）
PyTorch	≥ 1.13	深度学习（层级 2）
Stable-Baselines3	≥ 1.7	强化学习算法
Gymnasium	≥ 0.28	强化学习环境接口
open3d	≥ 0.16	点云处理

pip install numpy scipy pybullet torch stable-baselines3 gymnasium open3d opencv-python

3. 层级 1 — 传统：逆运动学（Inverse Kinematics）¶

3.1 正运动学回顾¶

给定关节角度 \(\mathbf{q} = [q_1, q_2, \ldots, q_n]^T\)，正运动学（FK） 计算末端执行器姿态：

\[ \mathbf{T}(\mathbf{q}) = \begin{bmatrix} \mathbf{R}(\mathbf{q}) & \mathbf{p}(\mathbf{q}) \\ \mathbf{0}_{1\times3} & 1 \end{bmatrix} \in SE(3) \]

其中 \(\mathbf{R} \in SO(3)\) 为旋转矩阵，\(\mathbf{p} \in \mathbb{R}^3\) 为位置向量。

Denavit-Hartenberg（DH）参数（6-DoF 机械臂）：

关节	\(a_i\)（mm）	\(\alpha_i\)（rad）	\(d_i\)（mm）	\(\theta_i\)
1	0	\(-\pi/2\)	\(d_1\)	\(q_1\)
2	\(a_2\)	0	0	\(q_2 - \pi/2\)
3	\(a_3\)	0	0	\(q_3\)
4	0	\(-\pi/2\)	\(d_4\)	\(q_4\)
5	0	\(\pi/2\)	0	\(q_5\)
6	0	0	\(d_6\)	\(q_6\)

3.2 解析 IK — 几何法¶

对于 6-DoF 关节型机械臂（如 UR5），将问题分解为：

腕部中心 \(\mathbf{p}_c = \mathbf{p}_e - d_6 \cdot \mathbf{n}\)（工具偏移）
肩部 — 求解 \(q_1\)（方位角）和 \(q_2, q_3\)（肘部构型）
腕部姿态 — 从剩余旋转中求解 \(q_4, q_5, q_6\)

2 连杆平面 IK（肩部 + 肘部）：

         肩部
          O  q1
         /
        /
       O───X  （腕部中心）
      / \
     /   \
    /     \
 q2 O       O q3 （肘部）      连杆1=L1, 连杆2=L2

给定腕部中心 \(\mathbf{p}_c = (x, y, z)\)：

\[ r = \sqrt{x^2 + z^2} \quad \text{（矢状面内2D距离）} \]

**余弦定理**求解肘部角度 \(\theta_2\)：

\[ \cos \theta_2 = \frac{r^2 - L_1^2 - L_2^2}{2 L_1 L_2} \quad \Rightarrow \quad \theta_2 = \pm \arccos(\cdots) \]

（− 用于"肘部向上"构型，+ 用于"肘部向下"构型）

肩部角度 \(\theta_1\)：

\[ \theta_1 = \text{atan2}(x, y) - \arctan_2\!\left(L_2 \sin\theta_2,\; L_1 + L_2 \cos\theta_2\right) \]

3.3 数值 IK — 雅可比方法¶

当解析解不可用时，数值 IK 迭代优化 \(\mathbf{q}\)：

雅可比伪逆：

\[ \dot{\mathbf{q}} = \mathbf{J}^{\dagger} \dot{\mathbf{x}}, \quad \mathbf{J}^{\dagger} = \mathbf{J}^T \left(\mathbf{J} \mathbf{J}^T\right)^{-1} \]

阻尼最小二乘法（DLS） — 避免奇异的标准方法：

\[ \dot{\mathbf{q}} = \mathbf{J}^T \left(\mathbf{J} \mathbf{J}^T + \lambda^2 \mathbf{I}\right)^{-1} \dot{\mathbf{x}} \]

其中 \(\lambda\) 为**阻尼因子**（典型值：\(\lambda \in [0.01, 0.5]\)）。

奇异值分解（SVD）形式：

\[ \dot{\mathbf{q}} = \mathbf{V} \mathbf{\Sigma}^{\dagger} \mathbf{U}^T \dot{\mathbf{x}} + \frac{\lambda^2}{\sigma_i^2 + \lambda^2} (\mathbf{I} - \mathbf{J}^{\dagger}\mathbf{J}) \mathbf{q}_{\text{null}} \]

3.4 完整 Python 代码 — 数值 IK¶

"""
层级 1: 数值 IK — 阻尼最小二乘法（DLS）
==================================================
适用于具有已知 DH 参数或 FK 函数的任意串联机械臂。
"""

import numpy as np


def skew(v: np.ndarray) -> np.ndarray:
    """从3向量构造反对称矩阵。"""
    return np.array([
        [0,    -v[2],  v[1]],
        [v[2],   0,   -v[0]],
        [-v[1], v[0],   0]
    ])


def rot_x(angle: float) -> np.ndarray:
    """绕 X 轴旋转 `angle` 弧度。"""
    c, s = np.cos(angle), np.sin(angle)
    return np.array([[1, 0, 0], [0, c, -s], [0, s, c]])


def rot_z(angle: float) -> np.ndarray:
    """绕 Z 轴旋转 `angle` 弧度。"""
    c, s = np.cos(angle), np.sin(angle)
    return np.array([[c, -s, 0], [s, c, 0], [0, 0, 1]])


def dh_transform(a, alpha, d, theta):
    """标准 DH 变换矩阵。"""
    ct, st = np.cos(theta), np.sin(theta)
    ca, sa = np.cos(alpha), np.sin(alpha)
    return np.array([
        [ct, -st*ca,  st*sa, a*ct],
        [st,  ct*ca, -ct*sa, a*st],
        [0,   sa,     ca,    d],
        [0,   0,      0,     1]
    ])


def joint_to_transform(q: np.ndarray) -> np.ndarray:
    """
    简化的 UR5 类 6R 平面机械臂 FK。
    返回末端执行器的 4x4 齐次变换。

    Parameters
    ----------
    q : shape (6,) 的数组
        关节角度 [q1..q6]，单位：弧度

    Returns
    -------
    T : (4, 4) 齐次变换矩阵
    """
    # 连杆长度（UR5 类，米为单位）
    L1, L2, L3, L4, L5, L6 = 0.089, 0.425, 0.392, 0.109, 0.093, 0.082
    d1 = L1
    a2, a3 = L2, L3

    # DH 变换（标准惯例）
    # 关节 i: 绕 Z 旋转 θ_i，平移 d_i 沿 Z，
    #         平移 a_i 沿 X，绕 X 旋转 α_i
    # 关节 1: 肩部偏航
    T01 = dh_transform(0,       -np.pi/2, d1,       q[0])
    # 关节 2: 肩部俯仰
    T12 = dh_transform(a2,       0,       0,        q[1] - np.pi/2)
    # 关节 3: 肘部俯仰
    T23 = dh_transform(a3,       0,       0,        q[2])
    # 关节 4: 腕部 1 翻滚
    T34 = dh_transform(0,      -np.pi/2, L4,       q[3])
    # 关节 5: 腕部 2 俯仰
    T45 = dh_transform(0,       np.pi/2, 0,        q[4])
    # 关节 6: 腕部 3 翻滚
    T56 = dh_transform(0,       0,       L5 + L6,  q[5])

    T = T01 @ T12 @ T23 @ T34 @ T45 @ T56
    return T


def geometric_jacobian(q: np.ndarray) -> np.ndarray:
    """
    计算 6×n 几何雅可比矩阵 J(q)。
    列: [Jv_1..Jv_n; Jw_1..Jw_n]
    """
    J = np.zeros((6, len(q)))
    T = np.eye(4)

    for i in range(len(q)):
        # 关节框架 i 的 Z 轴（在世界坐标系中）
        axis_world = T[:3, 2].copy()
        ee_pos = joint_to_transform(q)[:3, 3]
        J[:3, i] = np.cross(axis_world, ee_pos - T[:3, 3])
        J[3:, i] = axis_world

        # 更新 T（使用 DH 变换递进）
        if i == 0:
            T_i = dh_transform(0, -np.pi/2, 0.089, q[0])
        elif i == 1:
            T_i = dh_transform(0.425, 0, 0, q[1] - np.pi/2)
        elif i == 2:
            T_i = dh_transform(0.392, 0, 0, q[2])
        elif i == 3:
            T_i = dh_transform(0, -np.pi/2, 0.109, q[3])
        elif i == 4:
            T_i = dh_transform(0, np.pi/2, 0, q[4])
        elif i == 5:
            T_i = dh_transform(0, 0, 0.175, q[5])
        T = T @ T_i
    return J


def numerical_ik(target_pose: np.ndarray, q_init: np.ndarray,
                 max_iter: int = 200, tol: float = 1e-4,
                 lambda_damp: float = 0.1) -> np.ndarray:
    """
    6R 机械臂的阻尼最小二乘 IK。

    Parameters
    ----------
    target_pose : (4, 4) 齐次变换矩阵
    q_init      : 初始关节角度
    max_iter    : 最大迭代次数
    tol         : 收敛阈值（位置误差，米）
    lambda_damp : 阻尼因子 λ

    Returns
    -------
    q : (6,) 关节角度（弧度）
    """
    q = q_init.copy()
    target_pos = target_pose[:3, 3]
    target_rot = target_pose[:3, :3]

    for iteration in range(max_iter):
        T = joint_to_transform(q)
        pos_err = target_pos - T[:3, 3]

        # 姿态误差：通过 R_error = R_desired @ R_current.T 的轴角提取
        R_err = target_rot @ T[:3, :3].T
        trace_R = np.clip(np.trace(R_err), -1.0, 3.0)
        theta_err = np.arccos((trace_R - 1) / 2)
        if theta_err > 1e-6:
            axis = (1 / (2 * np.sin(theta_err))) * np.array([
                R_err[2, 1] - R_err[1, 2],
                R_err[0, 2] - R_err[2, 0],
                R_err[1, 0] - R_err[0, 1]
            ])
        else:
            axis = np.zeros(3)
        ori_err = theta_err * axis

        x_err = np.concatenate([pos_err, ori_err])

        if np.linalg.norm(x_err) < tol:
            print(f"在 {iteration} 次迭代后收敛")
            break

        # 几何雅可比矩阵
        J = geometric_jacobian(q)

        # DLS 更新
        lambda_sq = lambda_damp ** 2
        delta_q = J.T @ np.linalg.inv(J @ J.T + lambda_sq * np.eye(6)) @ x_err
        q = q + delta_q

        # 关节限位（UR5）
        q = np.clip(q, np.radians([-180, -128, -128, -266, -266, -266]),
                         np.radians([ 180,  128,  128,  266,  266,  266]))

    return q


def demo():
    """测试 IK：从初始位置移动到目标姿态。"""
    # 目标：位置 (0.4, 0.2, 0.3)，旋转 = 单位矩阵
    target = np.eye(4)
    target[:3, 3] = [0.4, 0.2, 0.3]

    q_init = np.zeros(6)   # 从初始位置开始
    q_solution = numerical_ik(target, q_init, lambda_damp=0.15)

    # 验证
    T_verify = joint_to_transform(q_solution)
    print(f"目标位置    : {target[:3, 3]}")
    print(f"到达位置    : {T_verify[:3, 3]}")
    print(f"关节角度解 : {np.degrees(q_solution)} 度")
    return q_solution


if __name__ == "__main__":
    demo()

4. 层级 2 — 中级：GraspNet 风格深度学习¶

4.1 概述¶

GraspNet（GraspNet-1B：6-DoF 抓取检测）从单张 RGB-D 图像 / 点云预测 6-DoF 抓取姿态（位置 + 方向 + 夹爪开度）。流程如下：

  RGB-D 图像 / 点云
         │
         ▼
  采样抓取候选（成千上万个）
  [x, y, z, roll, pitch, yaw, 夹爪开度]
         │
         ▼
  抓取质量 CNN 评分器
  [Q(成功) ∈ [0, 1]，每个候选对应一个分数]
         │
         ▼
  非极大值抑制 → Top-K 抓取排序
         │
         ▼
  执行（IK → 轨迹规划 → 合爪）

4.2 抓取表示 — 6-DoF 姿态¶

抓取由以下完整定义：

\[ \mathcal{G} = (\mathbf{p}, \mathbf{R}, w) \]

符号	含义	范围
\(\mathbf{p} \in \mathbb{R}^3\)	抓取位置（接触中点）	世界坐标
\(\mathbf{R} \in SO(3)\)	抓取方向（夹爪坐标系）	3-DoF 旋转
\(w\)	夹爪开度（手指张开宽度）	0 – 0.08 m

夹爪坐标系约定：\(x\) 轴 = 接近方向，\(z\) 轴 = 手指闭合方向。

4.3 抓取质量指标¶

在学习之前，我们可以解析地评估抓取：

力封闭（Force Closure）：当任何外部力旋量都能被接触力抵消时，抓取达到力封闭。2 指平面抓取的条件：

\[ \mathbf{G} \mathbf{f} = -\mathbf{w}_{ext} \quad \text{（力旋量平衡）} \]

其中 \(\mathbf{G}\) 为**抓取矩阵**，\(\mathbf{f}\) 为接触力。

Epsilon 质量 \(\epsilon \in [0, 1]\)：在保持单位接触力的前提下，任意方向上所能承受的最大力旋量：

\[ \epsilon = \min_{\mathbf{f}, \|\mathbf{f}\| \leq 1} \max_{\|\mathbf{w}\|=1} \mathbf{w}^T \mathbf{G} \mathbf{f} \]

\(\epsilon\) 越高 = 抓取越稳定。

力旋量空间体积：抓取力旋量空间（GWS）的体积近似为：

\[ V_{GWS} = \det\left(\mathbf{G} \mathbf{G}^T\right)^{1/2} \]

4.4 完整 Python 代码 — 抓取候选采样 + 质量 CNN¶

"""
层级 2: GraspNet 风格抓取姿态预测
============================================
1. 从点云中采样抓取候选
2. 用 CNN 质量估计器评分
3. 输出 Top-K 排序的抓取
"""

import numpy as np
import torch
import torch.nn as nn


# ─── 抓取表示 ────────────────────────────────────────────────

class Grasp:
    """
    6-DoF 抓取姿态。

    frame: SE(3) 变换，其中：
      - 原点   = 抓取接触中点
      - x 轴   = 接近方向（朝向物体）
      - z 轴   = 手指闭合方向
    width: 夹爪张开宽度（米）
    """

    def __init__(self, frame: np.ndarray, width: float):
        self.frame = frame
        self.width = width

    @property
    def position(self) -> np.ndarray:
        return self.frame[:3, 3]

    @property
    def approach(self) -> np.ndarray:
        """沿夹爪 x 轴的单位向量（接近方向）。"""
        return self.frame[:3, 0]

    def to_matrix(self) -> np.ndarray:
        """将抓取展平为 9-DoF: [x, y, z, r11..r33, width]。"""
        return np.concatenate([self.position, self.frame[:3, :3].flatten(), [self.width]])


# ─── 抓取候选采样 ────────────────────────────────────────────

def sample_antipodal_points(points: np.ndarray, normals: np.ndarray,
                            num_samples: int = 5000) -> list:
    """
    通过选择对向点对来采样抓取候选。

    对向点对的法向量大致指向彼此。
    抓取中心位于两点之间的中点。

    Parameters
    ----------
    points      : (N, 3) 点云
    normals     : (N, 3) 表面法向量
    num_samples : 最大候选数量

    Returns
    -------
    candidates : Grasp 对象列表
    """
    valid = np.where(np.linalg.norm(normals, axis=1) > 0.1)[0]
    points = points[valid]
    normals = normals[valid]

    candidates = []
    for _ in range(num_samples):
        i = np.random.randint(0, len(points))
        j = np.random.randint(0, len(points))
        if i == j:
            continue

        p1, n1 = points[i], normals[i]
        p2, n2 = points[j], normals[j]

        dist = np.linalg.norm(p1 - p2)
        if dist < 0.01 or dist > 0.10:
            continue

        # 法向量应大致相对（对向条件）
        dot_normals = np.dot(n1, n2)
        if dot_normals > -0.5:
            continue

        # 抓取坐标系
        center = (p1 + p2) / 2
        approach = -(n1 + n2)
        approach /= (np.linalg.norm(approach) + 1e-8)

        # 构建正交基
        down = np.array([0, 0, -1])
        binormal = np.cross(approach, down)
        if np.linalg.norm(binormal) < 1e-4:
            binormal = np.cross(approach, np.array([0, 1, 0]))
        binormal /= np.linalg.norm(binormal)
        closing = np.cross(approach, binormal)

        frame = np.eye(4)
        frame[:3, 0] = approach
        frame[:3, 1] = closing
        frame[:3, 2] = binormal
        frame[:3, 3] = center

        candidates.append(Grasp(frame, width=dist))

    return candidates


# ─── 抓取质量 CNN ───────────────────────────────────────────

class GraspQualityCNN(nn.Module):
    """
    从以下输入预测抓取成功概率：
    - 抓取周围的点云裁剪块 (N × 3)
    - 抓取姿态特征 (9-DoF)
    """

    def __init__(self, num_points: int = 256, pose_dim: int = 9,
                 hidden_dim: int = 256):
        super().__init__()
        self.num_points = num_points

        # 点-wise MLP（逐点共享权重）
        self.point_encoder = nn.Sequential(
            nn.Conv1d(3, 64, 1), nn.ReLU(),
            nn.Conv1d(64, 128, 1), nn.ReLU(),
            nn.Conv1d(128, 256, 1), nn.ReLU(),
        )

        # 姿态编码器
        self.pose_encoder = nn.Sequential(
            nn.Linear(pose_dim, 128), nn.ReLU(),
            nn.Linear(128, 256), nn.ReLU(),
        )

        # 融合 + 评分头
        self.fusion = nn.Sequential(
            nn.Linear(256 + 256, hidden_dim), nn.ReLU(),
            nn.Linear(hidden_dim, 128), nn.ReLU(),
            nn.Linear(128, 1), nn.Sigmoid(),
        )

    def forward(self, points: torch.Tensor, pose: torch.Tensor) -> torch.Tensor:
        """
        Parameters
        ----------
        points : (B, N, 3) 点云裁剪块
        pose   : (B, 9) 抓取姿态特征

        Returns
        -------
        score  : (B, 1) 成功概率
        """
        B = points.size(0)

        pc_feat = self.point_encoder(points.permute(0, 2, 1))  # (B, 256, N)
        pc_feat = pc_feat.max(dim=2)[0]                         # (B, 256)

        pose_feat = self.pose_encoder(pose)                     # (B, 256)

        combined = torch.cat([pc_feat, pose_feat], dim=1)
        score = self.fusion(combined)
        return score


def score_grasps(candidates: list, pc_crop: np.ndarray,
                 model: GraspQualityCNN, device: str = "cpu"
                 ) -> list[tuple]:
    """
    用 CNN 对所有抓取候选评分。

    Parameters
    ----------
    candidates : Grasp 对象列表
    pc_crop    : 场景点云 (M, 3)
    model      : 训练好的 GraspQualityCNN

    Returns
    -------
    scored_grasps : 按分数降序排列的 (Grasp, score) 列表
    """
    model.eval()
    batch_size = 64
    scored = []

    with torch.no_grad():
        for i in range(0, len(candidates), batch_size):
            batch = candidates[i:i + batch_size]

            pose_batch = np.array([g.to_matrix() for g in batch]).astype(np.float32)
            pose_tensor = torch.from_numpy(pose_batch).to(device)

            pc_resampled = resample_pc(pc_crop, num_points=model.num_points)
            pc_tensor = torch.from_numpy(pc_resampled).unsqueeze(0).expand(len(batch), -1, -1)
            pc_tensor = pc_tensor.to(device)

            scores = model(pc_tensor.float(), pose_tensor.float())
            for g, s in zip(batch, scores.cpu().numpy().flatten()):
                scored.append((g, float(s)))

    scored.sort(key=lambda x: x[1], reverse=True)
    return scored


def resample_pc(pc: np.ndarray, num_points: int) -> np.ndarray:
    """将点云随机重采样（或填充）到固定大小。"""
    N = len(pc)
    if N >= num_points:
        idx = np.random.choice(N, num_points, replace=False)
    else:
        idx = np.random.choice(N, num_points, replace=True)
    return pc[idx]


def demo_graspnet():
    """模拟演示 — 实际使用时加载 RealSense + 训练好的模型。"""
    np.random.seed(42)

    # 模拟点云：球形物体
    N = 1000
    angles = np.random.uniform(0, 2*np.pi, N)
    heights = np.random.uniform(-0.05, 0.05, N)
    radii = np.random.uniform(0.03, 0.05, N)
    points = np.stack([radii * np.cos(angles),
                        radii * np.sin(angles),
                        heights], axis=1)
    normals = points / np.linalg.norm(points, axis=1, keepdims=True)

    # 采样候选
    candidates = sample_antipodal_points(points, normals, num_samples=200)
    print(f"采样了 {len(candidates)} 个抓取候选")

    # 用随机（未训练）CNN 评分
    device = "cpu"
    model = GraspQualityCNN().to(device)
    scored = score_grasps(candidates, points, model, device)

    print("Top 5 抓取：")
    for g, score in scored[:5]:
        print(f"  score={score:.3f}  pos={g.position}  width={g.width:.4f}m")

    return scored


if __name__ == "__main__":
    demo_graspnet()

5. 层级 3 — 现代：强化学习抓取¶

5.1 概念¶

**RL 抓取**不单独规划抓取姿态，而是直接学习策略 \(\pi_\theta(\mathbf{a} \mid \mathbf{o})\)，将观测映射为夹爪动作。机器人通过试错学习哪些策略能带来成功的抓取。

  ┌──────────────────────────────────────────────────────────┐
  │                  RL 抓取循环                              │
  │                                                           │
  │   观测 o_t                                               │
  │   [rgb, depth, 本体感受状态]                              │
  │         │                                                 │
  │         ▼                                                 │
  │   策略 π_θ（神经网络）                                    │
  │         │                                                 │
  │         ▼                                                 │
  │   动作 a_t = [Δx, Δy, Δz, Δθ, 开/合夹爪]                  │
  │         │                                                 │
  │         ▼                                                 │
  │   PyBullet / Isaac Gym 物理仿真                          │
  │         │                                                 │
  │         ▼                                                 │
  │   奖励 r_t = +1（成功）/ -0.01（步惩罚）                  │
  │         │                                                 │
  │         ▼                                                 │
  │   策略更新（PPO / SAC）                                   │
  └──────────────────────────────────────────────────────────┘

5.2 RL 公式化¶

状态 \(\mathbf{s}_t\)： - 末端执行器姿态 \((x, y, z, \phi, \theta, \psi)\) - 夹爪开度 \(w\) - 物体姿态（来自仿真）

动作 \(\mathbf{a}_t\)： - \(\Delta x, \Delta y, \Delta z\) — 位置增量（米） - \(\Delta \phi\) — 腕部翻滚增量（弧度） - gripper_open — 二进制（0 = 闭合，1 = 张开）

奖励塑造：

事件	奖励
抓取成功（物体被提起）	\(r_{\text{success}} = +10\)
提起后掉落	\(r_{\text{drop}} = -5\)
与桌面碰撞	\(r_{\text{coll}} = -2\)
步惩罚	\(r_{\text{step}} = -0.01\)
接近奖励（靠近物体）	\(r_{\text{approach}} = +0.1 \cdot \Delta d\)

5.3 完整 Python 代码 — PyBullet 抓取环境 + PPO¶

"""
层级 3: PyBullet + PPO 强化学习抓取
============================================
自定义 Gymnasium 环境，模拟 6-DoF 机械臂抓取任务。
训练 PPO 智能体接近、抓取并提起一个方块。
"""

import numpy as np
import gymnasium as gym
from gymnasium import spaces
from stable_baselines3 import PPO
import pybullet as p
import pybullet_data
import time


class GraspingEnv(gym.Env):
    """
    基于 PyBullet 的抓取环境。

    场景：
      - 平面（桌面表面）
      - 1 个立方体物体（边长 0.05 m）
      - 6-DoF 机械臂（简化运动学模型）

    观测: (12,)
      - EE 位置 (3) + EE 速度 (3)
      - 夹爪开度 (1) + 夹爪速度 (1)
      - 物体相对 EE 的位置 (3) + 物体速度 (1)

    动作: (5,)
      - dx, dy, dz（EE 坐标系中的位置增量）
      - dtheta（腕部旋转增量）
      - gripper: 0=闭合, 1=张开
    """

    metadata = {"render_modes": ["human"], "render_fps": 30}

    def __init__(self, render_mode=None, gui=True):
        super().__init__()
        self.render_mode = render_mode
        self.gui = gui

        # 观测: 12 维
        self.observation_space = spaces.Box(
            low=-np.inf, high=np.inf, shape=(12,), dtype=np.float32)
        # 动作: 5 维
        self.action_space = spaces.Box(
            low=-1.0, high=1.0, shape=(5,), dtype=np.float32)

        # 物理客户端
        self.client_id = p.connect(p.GUI if gui else p.DIRECT)
        p.setAdditionalSearchPath(pybullet_data.getDataPath())
        p.setGravity(0, 0, -9.81, physicsClientId=self.client_id)

        # 仿真参数
        self.dt = 0.02           # 50 Hz
        self.max_steps = 200     # 4 秒 episode
        self.reach_threshold = 0.04    # 抓取成功距离（米）
        self.lift_height = 0.10        # 物体必须升起的高度

        self.object_id = None
        self.table_id = None
        self.step_count = 0

    # ─── 场景设置 ────────────────────────────────────────────

    def _setup_scene(self):
        """创建桌面、物体和机械臂。"""
        # 桌面
        self.table_id = p.loadURDF("plane.urdf", physicsClientId=self.client_id)
        table_visual = p.createVisualShape(
            p.GEOM_BOX, halfExtents=[0.3, 0.3, 0.01],
            rgbaColor=[0.6, 0.5, 0.4, 1], physicsClientId=self.client_id)
        p.createMultiBody(
            baseMass=0, baseVisualShapeIndex=table_visual,
            basePosition=[0.4, 0, 0.005], physicsClientId=self.client_id)

        # 立方体物体
        obj_collision = p.createCollisionShape(
            p.GEOM_BOX, halfExtents=[0.025]*3)
        obj_visual = p.createVisualShape(
            p.GEOM_BOX, halfExtents=[0.025]*3,
            rgbaColor=[0.8, 0.2, 0.2, 1])
        self.object_id = p.createMultiBody(
            baseMass=0.01, baseCollisionShapeIndex=obj_collision,
            baseVisualShapeIndex=obj_visual,
            basePosition=[0.4, 0.0, 0.03],
            physicsClientId=self.client_id)

        # 简化机械臂：从 EE 位置开始（实际应从 URDF 加载）
        self.ee_pos = np.array([0.3, 0.0, 0.35])
        self.ee_quat = [0, 0, 0, 1]
        self.gripper_open = 1.0

    def _reset_robot(self):
        """随机化物体的位置，将机械臂复位到初始位置。"""
        obj_x = np.random.uniform(0.30, 0.50)
        obj_y = np.random.uniform(-0.15, 0.15)
        p.resetBasePositionAndOrientation(
            self.object_id, [obj_x, obj_y, 0.03], [0, 0, 0, 1],
            physicsClientId=self.client_id)
        self.ee_pos = np.array([0.3, 0.0, 0.35])
        self.gripper_open = 1.0

    # ─── Gymnasium 接口 ─────────────────────────────────────

    def reset(self, seed=None, options=None):
        super().reset(seed=seed)
        self._setup_scene()
        self._reset_robot()
        self.step_count = 0
        return self._get_obs(), {}

    def step(self, action):
        self.step_count += 1

        # ── 解析动作 ──
        dx = float(action[0]) * 0.05     # 每步 ±5 cm
        dy = float(action[1]) * 0.05
        dz = float(action[2]) * 0.03     # 每步 ±3 cm
        dtheta = float(action[3]) * 0.2  # ±0.2 rad
        gripper_cmd = float(action[4])   # 0=闭合, 1=张开

        # 更新夹爪
        self.gripper_open = np.clip(self.gripper_open + (gripper_cmd - 0.5) * 0.3,
                                     0.0, 1.0)

        # ── 移动 EE（简化运动学模型） ──
        self.ee_pos += np.array([dx, dy, dz])
        self.ee_pos[0] = np.clip(self.ee_pos[0], 0.15, 0.65)
        self.ee_pos[1] = np.clip(self.ee_pos[1], -0.35, 0.35)
        self.ee_pos[2] = np.clip(self.ee_pos[2], 0.01, 0.60)

        # ── 物理仿真步进 ──
        p.stepSimulation(physicsClientId=self.client_id)
        if self.render_mode == "human":
            time.sleep(self.dt)

        # ── 计算奖励 ──
        obj_pos = np.array(p.getBasePositionAndOrientation(
            self.object_id, physicsClientId=self.client_id)[0])
        obj_vel = np.array(p.getBaseVelocity(
            self.object_id, physicsClientId=self.client_id)[0])

        dist_ee_obj = np.linalg.norm(self.ee_pos - obj_pos)
        obj_lifted = obj_pos[2] > self.lift_height

        # 抓取条件：EE 靠近物体 AND 夹爪闭合
        is_grasping = (dist_ee_obj < self.reach_threshold) and (self.gripper_open < 0.3)

        # 奖励分量
        reward = -0.01
        reward += 0.05 * max(0, 0.10 - dist_ee_obj)   # 接近奖励
        if is_grasping and obj_lifted:
            reward += 10.0                            # 成功！
        elif is_grasping:
            reward += 1.0                             # 稳固抓取
        if obj_pos[2] < 0.01:
            reward -= 2.0                             # 物体掉落

        # 终止条件
        terminated = bool(obj_lifted and is_grasping)
        truncated = self.step_count >= self.max_steps

        return self._get_obs(), reward, terminated, truncated, {}

    def _get_obs(self) -> np.ndarray:
        obj_pos = np.array(p.getBasePositionAndOrientation(
            self.object_id, physicsClientId=self.client_id)[0])
        obj_vel = np.array(p.getBaseVelocity(
            self.object_id, physicsClientId=self.client_id)[0])

        rel_pos = obj_pos - self.ee_pos
        obs = np.concatenate([
            self.ee_pos,
            np.zeros(3),              # EE 速度（未知 → 零）
            np.array([self.gripper_open, 0.0]),
            rel_pos,
            [obj_vel[2]],
        ], dtype=np.float32)
        return np.clip(obs, -10, 10)

    def render(self):
        pass   # 由 PyBullet GUI 处理


# ─── 训练 ─────────────────────────────────────────────────

def train_ppo(total_timesteps: int = 200_000, save_path: str = "ppo_grasping"):
    """在抓取环境上训练 PPO 智能体。"""
    env = GraspingEnv(gui=False)

    model = PPO(
        policy="MlpPolicy",
        env=env,
        learning_rate=3e-4,
        n_steps=2048,
        batch_size=64,
        n_epochs=10,
        gamma=0.99,
        gae_lambda=0.95,
        clip_range=0.2,
        ent_coef=0.01,               # 鼓励探索
        verbose=1,
        tensorboard_log="./tb_logs_grasping/",
    )

    print(f"[INFO] 训练 PPO {total_timesteps} 步...")
    model.learn(total_timesteps=total_timesteps, progress_bar=True)
    model.save(save_path)
    print(f"[INFO] 模型保存至 {save_path}")
    return model


def evaluate(model_path: str = "ppo_grasping", episodes: int = 20):
    """评估训练好的模型。"""
    env = GraspingEnv(gui=True)
    model = PPO.load(model_path)

    successes = 0
    for ep in range(episodes):
        obs, _ = env.reset()
        done = False
        steps = 0

        while not done:
            action, _ = model.predict(obs, deterministic=True)
            obs, reward, terminated, truncated, _ = env.step(action)
            done = terminated or truncated
            steps += 1

        success = terminated
        successes += int(success)
        print(f"  Episode {ep+1}: {'✓ 成功' if success else '✗ 失败'}，步数 {steps}")

    print(f"\n成功率: {successes}/{episodes} = {successes/episodes*100:.1f}%")


if __name__ == "__main__":
    import sys
    if len(sys.argv) > 1 and sys.argv[1] == "eval":
        evaluate()
    else:
        train_ppo()

6. 分步实现指南¶

阶段 1 — 机械臂配置（所有层级）¶

连接机械臂：刷写 UR5/Panda 固件，验证 roscontrol 连通性
加载 URDF：rosparam load robot_description，在 RViz 中验证
初始位置：为所有关节定义安全的初始配置
工作空间标定：确定可达工作空间边界

# 使用 MoveIt 设置助手加载 UR5 URDF
roslaunch ur5_moveit_config demo.launch

阶段 2 — 层级 1：数值 IK¶

定义 DH 参数或使用 kdl_kinematics_plugin 从 URDF 加载
使用 URDF 链实现 geometric_jacobian()
从初始位置 → 随机可达目标运行 numerical_ik()
通过比较 IK 解 → FK 正向姿态来验证
与 MoveIt 集成：

from moveit_commander import MoveGroupCommander
mgc = MoveGroupCommander("manipulator")
mgc.set_pose_target(pose_target)   # geometry_msgs/Pose
plan = mgc.plan()
mgc.execute(plan, wait=True)

阶段 3 — 层级 2：GraspNet 流程¶

点云捕获：订阅 /camera/depth/points（ROS）或 RealSense SDK
降采样：体素网格滤波器（叶尺寸 2–5 mm）
裁剪：仅保留工作空间内的点云
法向量估计：open3d.estimate_normals()
采样候选：运行 sample_antipodal_points()
评分：批量运行 CNN 推理
执行：取 Top-1 抓取 → 运行 IK → 移动机械臂 → 合爪
循环：若失败，使用次优候选重试

阶段 4 — 层级 3：强化学习训练¶

启动 PyBullet 仿真：验证物体生成和机械臂动力学
基线 RL：训练 PPO 200k 步（约 10 分钟，CPU）
超参数调优：调整 ent_coef、gamma、n_steps
域随机化：改变物体大小、质量、摩擦力、相机噪声
Sim-to-Real：将策略迁移到真实机械臂；用在线数据微调

7. 方法对比¶

指标	数值 IK（层级 1）	GraspNet 深度学习（层级 2）	强化学习抓取（层级 3）
抓取姿态	给定（用户指定）	从点云预测	端到端学习
物体感知	已知 / 预知场景	RGB-D / 点云	原始传感器
训练数据	无	10 亿合成抓取	试错
泛化能力	确定性	强（CNN 特征）	需要 sim-to-real
计算速度	~100 Hz	~10 Hz（GPU 批处理）	~50 Hz（推理）
未知物体	❌ 不支持	✅ 支持	✅ 训练后可支持
遮挡处理	❌ 不支持	⚠️ 有限	⚠️ 有限
配置复杂度	低	中（GPU + 数据集）	高（仿真 + 训练）
适用场景	定点取放	杂乱场景	新物体、复杂动力学
难度	⭐	⭐⭐⭐	⭐⭐⭐⭐

8. 扩展与变化¶

8.1 多物体抓取¶

扩展 GraspNet 输出抓取序列（按可达性排序）
使用**场景图**推理物体之间的关系
结合任务规划实现"抓取 X 并放置到 Y"

8.2 阻抗控制¶

合爪后切换到**阻抗模式**，适应接触力
跟踪desired力旋量：\(\tau = \mathbf{J}^T \mathbf{F}_d + \mathbf{K}_p \mathbf{e} + \mathbf{K}_d \dot{\mathbf{e}}\)

8.3 灵巧操作¶

用 Allegro / Shadow Hand 替换并行夹爪
使用**接触模型**（Morse 摩擦锥）分析抓取稳定性
扩展到手中操作：不重新抓取的情况下改变物体朝向

8.4 接触模型¶

库仑摩擦（平面接触）：

\[ F_t \leq \mu F_n \]

其中 \(F_t\) 为切向力，\(F_n\) 为法向力，\(\mu\) 为摩擦系数。

2 指抓取的抓取矩阵（\(k\) 个接触点）：

\[ \mathbf{G} = \begin{bmatrix} \mathbf{I}_3 & \mathbf{I}_3 \\ [\mathbf{r}_1]_\times & [\mathbf{r}_2]_\times \end{bmatrix} \in \mathbb{R}^{6 \times 2k} \]

8.5 Sim-to-Real 迁移¶

问题	仿真解决方案	真实调整
相机噪声	向深度图添加高斯噪声	用真实数据重新调参
动力学不匹配	匹配 URDF 惯量参数	系统辨识
夹爪偏移	用棋盘格标定	在线标定
物体属性	随机化摩擦力和质量	从力/力矩传感器估计

9. 参考资料¶

Siciliano et al., 2008 — Robotics: Modelling, Planning and Control — 涵盖 IK、动力学和控制的综合机器人学教材
Spong et al., 2006 — Robot Modeling and Control — DH 参数、雅可比矩阵和数值 IK
GraspNet-1B 数据集与论文 — 10 亿抓取姿态、6-DoF 抓取检测基准
Zhou et al., 2022 — GraspNet-1B: A Large-Scale Benchmark for General Object Grasping — 从点云预测 6-DoF 抓取姿态
Fang et al., 2020 — Closing the Loop for Robotic Grasping: A Large Scale Learning Based Approach — 端到端抓取评分 CNN
Kalakrishnan et al., 2011 — Learning Force Control Policies for Compliant Manipulation — 力控操作的强化学习
Schulman et al., 2017 — Proximal Policy Optimization Algorithms — PPO 算法
Coumans & Bai, 2016–2023 — PyBullet Physics Engine — 机器人仿真
Sucan & Chitta, 2012 — MoveIt! ROS Integration for Planning Groups — 运动规划框架
Choi & Christensen, 2020 — Real-time Grasp Planning Using Multi-View SilhouetteNet — 多视角抓取规划
Open3D Documentation — 点云处理库
YCB Object Set — 机器人操作研究基准物体