Robotic Arm Grasping (机械臂抓取)¶

Project Type: Manipulation | Difficulty: ★★☆☆☆ to ★★★★★ (tier-dependent) | Estimated Time: 2–4 weekends

1. Project Overview¶

Grasping is a fundamental manipulation skill: the robot must position its end-effector (gripper) at a specific 6-DoF pose, close the fingers, and lift the object without dropping or colliding with the environment.

This project covers three algorithm tiers:

Tier	Approach	Input	Key Algorithm
Tier 1 — Traditional	Analytical + Numerical IK	Known target pose	Geometric IK, Jacobian pseudo-inverse, DLS
Tier 2 — Intermediate	GraspNet-style DL	RGB-D / Point Cloud	6-DoF grasp candidate sampling + CNN quality scoring
Tier 3 — Modern	RL-based Grasping	Raw sensor	PPO/SAC in Isaac Gym or PyBullet

  ┌──────────────────────────────────────────────────────────────┐
  │                    Robotic Arm Grasping Pipeline              │
  │                                                              │
  │   ┌──────────────┐   ┌──────────────┐   ┌───────────────┐    │
  │   │  Perceive     │──▶│  Grasp       │──▶│  Execute      │    │
  │   │  (Camera /    │   │  Planning    │   │  (Move to     │    │
  │   │   Depth)       │   │  (IK / DL /  │   │   pose, close │    │
  │   │                │   │   RL)         │   │   gripper)      │    │
  │   └──────────────┘   └──────────────┘   └───────────────┘    │
  │         │                  │                   │             │
  │         ▼                  ▼                   ▼             │
  │   Point Cloud         Grasp Pose          Success?            │
  │   / RGB-D            (x,y,z,r,p,y)       Feedback →           │
  │                       + gripper width      restart loop       │
  └──────────────────────────────────────────────────────────────┘

2. Hardware & Software Requirements¶

Hardware¶

Component	Specification	Notes
Robotic Arm	UR5e / Panda 7-DoF	6-DoF minimum; URDF required
Gripper	Parallel-jaw (2-finger)	e.g., Robotiq 2F-85, or custom
Camera	Intel RealSense D435 / Azure Kinect	RGB-D required for Tier 2+
Computer	Ubuntu 20.04, GPU ≥ 6 GB VRAM	For DL/RL training
Object Set	YCB / GraspNet-1B objects	Various shapes and sizes
(Optional)	ATI Force/Torque sensor	For grasp quality evaluation

Software¶

Package	Version	Purpose
Python	≥ 3.8	Core language
NumPy / SciPy	≥ 1.20	Numerical IK
MoveIt	ROS Noetic	Motion planning + IK
PyBullet	≥ 3.0	Physics simulation (Tier 3)
PyTorch	≥ 1.13	Deep learning (Tier 2)
Stable-Baselines3	≥ 1.7	RL algorithms
Gymnasium	≥ 0.28	RL environment interface
open3d	≥ 0.16	Point cloud processing

pip install numpy scipy pybullet torch stable-baselines3 gymnasium open3d opencv-python

3. Tier 1 — Traditional: Inverse Kinematics¶

3.1 Forward Kinematics Review¶

Given joint angles \(\mathbf{q} = [q_1, q_2, \ldots, q_n]^T\), forward kinematics (FK) computes the end-effector pose:

\[ \mathbf{T}(\mathbf{q}) = \begin{bmatrix} \mathbf{R}(\mathbf{q}) & \mathbf{p}(\mathbf{q}) \\ \mathbf{0}_{1\times3} & 1 \end{bmatrix} \in SE(3) \]

where \(\mathbf{R} \in SO(3)\) is the rotation matrix and \(\mathbf{p} \in \mathbb{R}^3\) is the position.

Denavit-Hartenberg (DH) parameters for a 6-DoF manipulator:

Joint	\(a_i\) (mm)	\(\alpha_i\) (rad)	\(d_i\) (mm)	\(\theta_i\)
1	0	\(-\pi/2\)	\(d_1\)	\(q_1\)
2	\(a_2\)	0	0	\(q_2 - \pi/2\)
3	\(a_3\)	0	0	\(q_3\)
4	0	\(-\pi/2\)	\(d_4\)	\(q_4\)
5	0	\(\pi/2\)	0	\(q_5\)
6	0	0	\(d_6\)	\(q_6\)

3.2 Analytical IK — Geometric Approach¶

For a 6-DoF articulated arm (e.g., UR5), we decompose the problem:

Wrist center \(\mathbf{p}_c = \mathbf{p}_e - d_6 \cdot \mathbf{n}\) (tool offset)
Shoulder — solve for \(q_1\) (azimuth) and \(q_2, q_3\) (elbow configuration)
Wrist orientation — solve for \(q_4, q_5, q_6\) from remaining rotation

2-Link planar IK (shoulder + elbow):

         Shoulder
            O  q1
           /
          /
         O───X  (wrist center)
        / \
       /   \
      /     \
   q2 O       O q3 (elbow)     Link1=L1, Link2=L2

Given wrist center \(\mathbf{p}_c = (x, y, z)\):

\[ r = \sqrt{x^2 + z^2} \quad \text{(2D distance in sagittal plane)} \]

Law of cosines for elbow angle \(\theta_2\):

\[ \cos \theta_2 = \frac{r^2 - L_1^2 - L_2^2}{2 L_1 L_2} \quad \Rightarrow \quad \theta_2 = \pm \arccos(\cdots) \]

(Use − for "elbow-up" configuration, + for "elbow-down")

Shoulder angle \(\theta_1\):

\[ \theta_1 = \text{atan2}(x, y) - \arctan_2\!\left(L_2 \sin\theta_2,\; L_1 + L_2 \cos\theta_2\right) \]

3.3 Numerical IK — Jacobian Methods¶

When analytical solutions are unavailable, numerical IK iteratively refines \(\mathbf{q}\):

Jacobian pseudo-inverse:

\[ \dot{\mathbf{q}} = \mathbf{J}^{\dagger} \dot{\mathbf{x}}, \quad \mathbf{J}^{\dagger} = \mathbf{J}^T \left(\mathbf{J} \mathbf{J}^T\right)^{-1} \]

Damped Least Squares (DLS) — the standard approach that avoids singularities:

\[ \dot{\mathbf{q}} = \mathbf{J}^T \left(\mathbf{J} \mathbf{J}^T + \lambda^2 \mathbf{I}\right)^{-1} \dot{\mathbf{x}} \]

where \(\lambda\) is the damping factor (typical: \(\lambda \in [0.01, 0.5]\)).

Singular Value Decomposition (SVD) form:

\[ \dot{\mathbf{q}} = \mathbf{V} \mathbf{\Sigma}^{\dagger} \mathbf{U}^T \dot{\mathbf{x}} + \frac{\lambda^2}{\sigma_i^2 + \lambda^2} (\mathbf{I} - \mathbf{J}^{\dagger}\mathbf{J}) \mathbf{q}_{\text{null}} \]

3.4 Complete Python Code — Numerical IK¶

"""
Tier 1: Numerical IK — Damped Least Squares (DLS)
=================================================
Applies to any serial manipulator with known DH parameters or FK function.
"""

import numpy as np


def skew(v: np.ndarray) -> np.ndarray:
    """Skew-symmetric matrix from 3-vector."""
    return np.array([
        [0,    -v[2],  v[1]],
        [v[2],   0,   -v[0]],
        [-v[1], v[0],   0]
    ])


def rot_x(angle: float) -> np.ndarray:
    """Rotation about X-axis by `angle` radians."""
    c, s = np.cos(angle), np.sin(angle)
    return np.array([[1, 0, 0], [0, c, -s], [0, s, c]])


def rot_z(angle: float) -> np.ndarray:
    """Rotation about Z-axis by `angle` radians."""
    c, s = np.cos(angle), np.sin(angle)
    return np.array([[c, -s, 0], [s, c, 0], [0, 0, 1]])


def joint_to_transform(q: np.ndarray) -> np.ndarray:
    """
    FK for a 6R planar arm (simplified UR5-like geometry).
    Returns 4x4 homogeneous transform of end-effector.

    Parameters
    ----------
    q : array of shape (6,)
        Joint angles in radians [q1..q6]

    Returns
    -------
    T : (4, 4) homogeneous transform
    """
    # Link lengths (UR5-style, in metres)
    L1, L2, L3, L4, L5, L6 = 0.089, 0.425, 0.392, 0.109, 0.093, 0.082

    # Shoulder to elbow
    T01 = rot_z(q[0]) @ np.block([[np.eye(3)], [np.zeros(3)]])  # placeholder
    # We'll compose individual DH transforms
    d1 = L1
    a2, a3 = L2, L3

    # Simplified DH frames (standard convention)
    # Joint i: rotate θ_i about Z, translate d_i along Z,
    #                   translate a_i along X,  rotate α_i about X
    def dh_transform(a, alpha, d, theta):
        ct, st = np.cos(theta), np.sin(theta)
        ca, sa = np.cos(alpha), np.sin(alpha)
        return np.array([
            [ct, -st*ca,  st*sa, a*ct],
            [st,  ct*ca, -ct*sa, a*st],
            [0,   sa,     ca,    d],
            [0,   0,      0,     1]
        ])

    # DH parameters (simplified UR5)
    # Joint 1: shoulder yaw
    T01 = dh_transform(0,       -np.pi/2, d1,       q[0])
    # Joint 2: shoulder pitch
    T12 = dh_transform(a2,       0,       0,        q[1] - np.pi/2)
    # Joint 3: elbow pitch
    T23 = dh_transform(a3,       0,       0,        q[2])
    # Joint 4: wrist 1 roll
    T34 = dh_transform(0,      -np.pi/2, L4,       q[3])
    # Joint 5: wrist 2 pitch
    T45 = dh_transform(0,       np.pi/2, 0,        q[4])
    # Joint 6: wrist 3 roll
    T56 = dh_transform(0,       0,       L5 + L6,  q[5])

    T = T01 @ T12 @ T23 @ T34 @ T45 @ T56
    return T


def extract_jacobian_column(T_prev: np.ndarray, joint_axis: np.ndarray,
                              joint_pos: np.ndarray, ee_pos: np.ndarray
                              ) -> np.ndarray:
    """Extract one column of the geometric Jacobian."""
    return np.cross(joint_axis, ee_pos - joint_pos)


def geometric_jacobian(q: np.ndarray) -> np.ndarray:
    """
    Compute the 6×n geometric Jacobian J(q).
    Columns: [Jv_1..Jv_n; Jw_1..Jw_n]
    """
    J = np.zeros((6, len(q)))
    T = np.eye(4)
    axes = []
    positions = []

    # Walk through joints, accumulating transforms
    # (simplified: just compute joint axes in world frame)
    for i in range(len(q)):
        # Z-axis of joint frame i (in world frame)
        axis_world = T[:3, 2].copy()
        ee_pos = joint_to_transform(q)[:3, 3]
        J[:3, i] = np.cross(axis_world, ee_pos - T[:3, 3])
        J[3:, i] = axis_world

        # Advance T (placeholder — replace with actual FK propagation)
        T_i = np.eye(4)
        if i == 0:
            T_i = dh_transform_sim(0, -np.pi/2, 0.089, q[0])
        elif i == 1:
            T_i = dh_transform_sim(0.425, 0, 0, q[1] - np.pi/2)
        elif i == 2:
            T_i = dh_transform_sim(0.392, 0, 0, q[2])
        elif i == 3:
            T_i = dh_transform_sim(0, -np.pi/2, 0.109, q[3])
        elif i == 4:
            T_i = dh_transform_sim(0, np.pi/2, 0, q[4])
        elif i == 5:
            T_i = dh_transform_sim(0, 0, 0.175, q[5])
        T = T @ T_i
    return J


def dh_transform_sim(a, alpha, d, theta):
    ct, st = np.cos(theta), np.sin(theta)
    ca, sa = np.cos(alpha), np.sin(alpha)
    return np.array([
        [ct, -st*ca,  st*sa, a*ct],
        [st,  ct*ca, -ct*sa, a*st],
        [0,   sa,     ca,    d],
        [0,   0,      0,    1]
    ])


def numerical_ik(target_pose: np.ndarray, q_init: np.ndarray,
                  max_iter: int = 200, tol: float = 1e-4,
                  lambda_damp: float = 0.1) -> np.ndarray:
    """
    Damped Least Squares IK for 6R manipulator.

    Parameters
    ----------
    target_pose : (4, 4) homogeneous transform
    q_init      : initial joint angles
    max_iter    : max iterations
    tol         : convergence threshold (position error in metres)
    lambda_damp : damping factor λ

    Returns
    -------
    q : (6,) joint angles (radians)
    """
    q = q_init.copy()
    target_pos = target_pose[:3, 3]
    target_rot = target_pose[:3, :3]

    for iteration in range(max_iter):
        T = joint_to_transform(q)
        pos_err = target_pos - T[:3, 3]

        # Orientation error via axis-angle of R_error = R_desired @ R_current.T
        R_err = target_rot @ T[:3, :3].T
        trace_R = np.trace(R_err)
        trace_R = np.clip(trace_R, -1.0, 3.0)
        theta_err = np.arccos((trace_R - 1) / 2)
        if theta_err > 1e-6:
            axis = (1 / (2 * np.sin(theta_err))) * np.array([
                R_err[2, 1] - R_err[1, 2],
                R_err[0, 2] - R_err[2, 0],
                R_err[1, 0] - R_err[0, 1]
            ])
        else:
            axis = np.zeros(3)
        ori_err = theta_err * axis

        x_err = np.concatenate([pos_err, ori_err])

        if np.linalg.norm(x_err) < tol:
            print(f"Converged in {iteration} iterations")
            break

        # Geometric Jacobian
        J = geometric_jacobian(q)

        # DLS update
        lambda_sq = lambda_damp ** 2
        delta_q = J.T @ np.linalg.inv(J @ J.T + lambda_sq * np.eye(6)) @ x_err
        q = q + delta_q

        # Joint limits (UR5)
        q = np.clip(q, np.radians([-180, -128, -128, -266, -266, -266]),
                         np.radians([ 180,  128,  128,  266,  266,  266]))

    return q


def demo():
    """Test IK: move from home to a target pose."""
    # Target: position (0.4, 0.2, 0.3), rotation = identity
    target = np.eye(4)
    target[:3, 3] = [0.4, 0.2, 0.3]

    q_init = np.zeros(6)  # start from home position
    q_solution = numerical_ik(target, q_init, lambda_damp=0.15)

    # Verify
    T_verify = joint_to_transform(q_solution)
    print(f"Target pos : {target[:3, 3]}")
    print(f"Achieved pos: {T_verify[:3, 3]}")
    print(f"Solution q  : {np.degrees(q_solution)} deg")
    return q_solution


if __name__ == "__main__":
    demo()

4. Tier 2 — Intermediate: GraspNet-Style Deep Learning¶

4.1 Overview¶

GraspNet (GraspNet-1B: 6-DoF Grasp Detection) predicts 6-DoF grasp poses (position + orientation + gripper width) from a single RGB-D image / point cloud. The pipeline:

  RGB-D Image / Point Cloud
         │
         ▼
  Sample Grasp Candidates (tens of thousands)
  [x, y, z, roll, pitch, yaw, gripper_width]
         │
         ▼
  Grasp Quality CNN Scorer
  [Q(success) ∈ [0, 1] for each candidate]
         │
         ▼
  Non-Maximum Suppression → Top-K Grasps
         │
         ▼
  Execute (IK → trajectory → gripper close)

4.2 Grasp Representation — 6-DoF Pose¶

A grasp is fully specified by:

\[ \mathcal{G} = (\mathbf{p}, \mathbf{R}, w) \]

Symbol	Meaning	Range
\(\mathbf{p} \in \mathbb{R}^3\)	Grasp position (contact midpoint)	World coords
\(\mathbf{R} \in SO(3)\)	Grasp orientation (gripper frame)	3-DoF rotation
\(w\)	Gripper width (fingers open)	0 – 0.08 m

Gripper frame convention: \(x\)-axis = approach direction, \(z\)-axis = finger closing direction.

4.3 Grasp Quality Metrics¶

Before learning, we can evaluate grasps analytically:

Force Closure: A grasp achieves force closure if any external wrench can be counteracted by contact forces. Condition for 2-finger planar grasp:

\[ \mathbf{G} \mathbf{f} = -\mathbf{w}_{ext} \quad \text{(wrench balance)} \]

where \(\mathbf{G}\) is the Grasp Matrix and \(\mathbf{f}\) are contact forces.

Epsilon Quality \(\epsilon \in [0, 1]\): Largest wrench that can be applied in any direction while keeping unit contact forces:

\[ \epsilon = \min_{\mathbf{f}, \|\mathbf{f}\| \leq 1} \max_{\|\mathbf{w}\|=1} \mathbf{w}^T \mathbf{G} \mathbf{f} \]

Higher \(\epsilon\) = more robust grasp.

Wrench Space Volume: Volume of the grasp wrench space (GWS), approximated as:

\[ V_{GWS} = \det\left(\mathbf{G} \mathbf{G}^T\right)^{1/2} \]

4.4 Complete Python Code — Grasp Candidate Sampling + Quality CNN¶

"""
Tier 2: GraspNet-Style Grasp Pose Prediction
============================================
1. Sample grasp candidates from point cloud
2. Score them with a CNN quality estimator
3. Output top-K ranked grasps
"""

import numpy as np
import torch
import torch.nn as nn


# ─── Grasp Representation ────────────────────────────────────────

class Grasp:
    """
    A 6-DoF grasp pose.

    frame: SE(3) transform where:
      - origin  = grasp contact midpoint
      - x-axis  = approach direction (toward object)
      - z-axis  = finger closing direction
    width: gripper opening (metres)
    """

    def __init__(self, frame: np.ndarray, width: float):
        # frame: (4, 4) homogeneous transform
        self.frame = frame
        self.width = width

    @property
    def position(self) -> np.ndarray:
        return self.frame[:3, 3]

    @property
    def approach(self) -> np.ndarray:
        """Unit vector along gripper x-axis (approach direction)."""
        return self.frame[:3, 0]

    def to_matrix(self) -> np.ndarray:
        """Flatten grasp to 9-DoF: [x, y, z, r11..r33, width]."""
        return np.concatenate([self.position, self.frame[:3, :3].flatten(), [self.width]])


# ─── Grasp Candidate Sampling ────────────────────────────────────

def sample_antipodal_points(points: np.ndarray, normals: np.ndarray,
                             num_samples: int = 5000) -> list:
    """
    Sample grasp candidates by selecting antipodal point pairs.

    An antipodal pair has normals pointing roughly toward each other.
    The grasp center is midway between the two points.

    Parameters
    ----------
    points   : (N, 3) point cloud
    normals  : (N, 3) surface normals
    num_samples : max number of candidates to generate

    Returns
    -------
    candidates : list of Grasp objects
    """
    N = len(points)
    candidates = []
    count = 0

    # Filter points with valid normals
    valid = np.where(np.linalg.norm(normals, axis=1) > 0.1)[0]
    points = points[valid]
    normals = normals[valid]

    for _ in range(num_samples):
        i = np.random.randint(0, len(points))
        j = np.random.randint(0, len(points))
        if i == j:
            continue

        p1, n1 = points[i], normals[i]
        p2, n2 = points[j], normals[j]

        # Distance between points
        dist = np.linalg.norm(p1 - p2)
        if dist < 0.01 or dist > 0.10:   # reject too close / too far
            continue

        # Antiformal: normals should be roughly opposing
        dot_normals = np.dot(n1, n2)
        if dot_normals > -0.5:            # not antipodal enough
            continue

        # Grasp frame: center at midpoint, approach = toward object
        center = (p1 + p2) / 2
        approach = -(n1 + n2)
        approach /= (np.linalg.norm(approach) + 1e-8)

        # Build orthonormal frame
        # binormal = approach × down (down = -gravity)
        down = np.array([0, 0, -1])
        binormal = np.cross(approach, down)
        if np.linalg.norm(binormal) < 1e-4:
            binormal = np.cross(approach, np.array([0, 1, 0]))
        binormal /= np.linalg.norm(binormal)
        closing = np.cross(approach, binormal)   # z-axis = finger dir

        frame = np.eye(4)
        frame[:3, 0] = approach    # x
        frame[:3, 1] = closing     # z  (closing direction)
        frame[:3, 2] = binormal    # y
        frame[:3, 3] = center

        candidates.append(Grasp(frame, width=dist))

    return candidates


# ─── Grasp Quality CNN ──────────────────────────────────────────

class GraspQualityCNN(nn.Module):
    """
    Predict grasp success probability from:
    - Point cloud crop around grasp (N × 3)
    - Grasp pose features (9-DoF)
    """

    def __init__(self, num_points: int = 256, pose_dim: int = 9,
                 hidden_dim: int = 256):
        super().__init__()
        self.num_points = num_points

        # Point-wise MLP (shared weights per point)
        self.point_encoder = nn.Sequential(
            nn.Conv1d(3, 64, 1), nn.ReLU(),
            nn.Conv1d(64, 128, 1), nn.ReLU(),
            nn.Conv1d(128, 256, 1), nn.ReLU(),
        )

        # Pose encoder
        self.pose_encoder = nn.Sequential(
            nn.Linear(pose_dim, 128), nn.ReLU(),
            nn.Linear(128, 256), nn.ReLU(),
        )

        # Fusion + score head
        self.fusion = nn.Sequential(
            nn.Linear(256 + 256, hidden_dim), nn.ReLU(),
            nn.Linear(hidden_dim, 128), nn.ReLU(),
            nn.Linear(128, 1), nn.Sigmoid(),   # score ∈ [0, 1]
        )

    def forward(self, points: torch.Tensor, pose: torch.Tensor) -> torch.Tensor:
        """
        Parameters
        ----------
        points : (B, N, 3) point cloud crop
        pose   : (B, 9) grasp pose features

        Returns
        -------
        score  : (B, 1) success probability
        """
        B = points.size(0)

        # Encode point cloud
        pc_feat = self.point_encoder(points.permute(0, 2, 1))  # (B, 256, N)
        pc_feat = pc_feat.max(dim=2)[0]                        # (B, 256) — max pooling

        # Encode pose
        pose_feat = self.pose_encoder(pose)                     # (B, 256)

        # Fusion
        combined = torch.cat([pc_feat, pose_feat], dim=1)       # (B, 512)
        score = self.fusion(combined)                          # (B, 1)
        return score


def score_grasps(candidates: list, pc_crop: np.ndarray,
                 model: GraspQualityCNN, device: str = "cpu"
                 ) -> list[tuple]:
    """
    Score all grasp candidates with the CNN.

    Parameters
    ----------
    candidates : list of Grasp objects
    pc_crop    : (M, 3) point cloud of the scene (crop around candidates)
    model      : trained GraspQualityCNN

    Returns
    -------
    scored_grasps : list of (Grasp, score) sorted descending by score
    """
    model.eval()
    batch_size = 64
    scored = []

    with torch.no_grad():
        for i in range(0, len(candidates), batch_size):
            batch = candidates[i:i + batch_size]

            # Build pose feature tensor
            pose_batch = np.array([g.to_matrix() for g in batch]).astype(np.float32)
            pose_tensor = torch.from_numpy(pose_batch).to(device)

            # Subsample point cloud to fixed size
            pc_resampled = resample_pc(pc_crop, num_points=model.num_points)
            pc_tensor = torch.from_numpy(pc_resampled).unsqueeze(0).expand(len(batch), -1, -1)
            pc_tensor = pc_tensor.to(device)

            # Score
            scores = model(pc_tensor.float(), pose_tensor.float())
            for g, s in zip(batch, scores.cpu().numpy().flatten()):
                scored.append((g, float(s)))

    scored.sort(key=lambda x: x[1], reverse=True)
    return scored


def resample_pc(pc: np.ndarray, num_points: int) -> np.ndarray:
    """Randomly resample (or pad) a point cloud to fixed size."""
    N = len(pc)
    if N >= num_points:
        idx = np.random.choice(N, num_points, replace=False)
    else:
        idx = np.random.choice(N, num_points, replace=True)
    return pc[idx]


def demo_graspnet():
    """Simulated demo — in practice, load RealSense + trained model."""
    np.random.seed(42)

    # Simulated point cloud: a sphere-like object
    N = 1000
    angles = np.random.uniform(0, 2*np.pi, N)
    heights = np.random.uniform(-0.05, 0.05, N)
    radii = np.random.uniform(0.03, 0.05, N)
    points = np.stack([radii * np.cos(angles),
                        radii * np.sin(angles),
                        heights], axis=1)
    normals = points / np.linalg.norm(points, axis=1, keepdims=True)

    # Sample candidates
    candidates = sample_antipodal_points(points, normals, num_samples=200)
    print(f"Sampled {len(candidates)} grasp candidates")

    # Score with a random (untrained) CNN
    device = "cpu"
    model = GraspQualityCNN().to(device)
    scored = score_grasps(candidates, points, model, device)

    print("Top 5 grasps:")
    for g, score in scored[:5]:
        print(f"  score={score:.3f}  pos={g.position}  width={g.width:.4f}m")

    return scored


if __name__ == "__main__":
    demo_graspnet()

5. Tier 3 — Modern: RL-Based Grasping¶

5.1 Concept¶

Instead of separately planning a grasp pose, RL-based grasping directly learns a policy \(\pi_\theta(\mathbf{a} \mid \mathbf{o})\) that maps observations to gripper actions. The robot learns through trial and error which strategies lead to successful grasps.

  ┌──────────────────────────────────────────────────────────┐
  │                  RL Grasping Loop                         │
  │                                                           │
  │   Observation o_t                                        │
  │   [rgb, depth, proprioceptive state]                     │
  │         │                                                 │
  │         ▼                                                 │
  │   Policy π_θ (neural network)                            │
  │         │                                                 │
  │         ▼                                                 │
  │   Action a_t = [Δx, Δy, Δz, Δθ, open/close gripper]     │
  │         │                                                 │
  │         ▼                                                 │
  │   PyBullet / Isaac Gym physics simulation                 │
  │         │                                                 │
  │         ▼                                                 │
  │   Reward r_t = +1 (success) / -0.01 (step penalty)       │
  │         │                                                 │
  │         ▼                                                 │
  │   Policy update (PPO / SAC)                               │
  └──────────────────────────────────────────────────────────┘

5.2 RL Formulation¶

State \(\mathbf{s}_t\): - End-effector pose \((x, y, z, \phi, \theta, \psi)\) - Gripper width \(w\) - Object pose(s) from simulation

Action \(\mathbf{a}_t\): - \(\Delta x, \Delta y, \Delta z\) — position delta (metres) - \(\Delta \phi\) — wrist roll delta (radians) - gripper_open — binary (0 = close, 1 = open)

Reward shaping:

Event	Reward
Grasp success (object lifted)	\(r_{\text{success}} = +10\)
Object dropped after lift	\(r_{\text{drop}} = -5\)
Collision with table	\(r_{\text{coll}} = -2\)
Step penalty	\(r_{\text{step}} = -0.01\)
Approach bonus (close to object)	\(r_{\text{approach}} = +0.1 \cdot \Delta d\)

5.3 Complete Python Code — PyBullet Grasping Environment + PPO¶

"""
Tier 3: RL-Based Grasping with PyBullet + PPO
=============================================
Custom Gymnasium environment simulating a 6-DoF arm grasping task.
Trains a PPO agent to approach, grasp, and lift a cube.
"""

import numpy as np
import gymnasium as gym
from gymnasium import spaces
from stable_baselines3 import PPO
import pybullet as p
import pybullet_data
import pybullet_utils as pu  # helper (see note below)
import time


class GraspingEnv(gym.Env):
    """
    PyBullet-based grasping environment.

    Scene:
      - Plane (table surface)
      - 1 cube object (0.05 m side)
      - 6-DoF robot arm (simplified kinematic model)

    Observation: (12,)
      - EE position (3) + EE velocity (3)
      - Gripper width (1) + gripper velocity (1)
      - Object position relative to EE (3) + object velocity (1)

    Action: (5,)
      - dx, dy, dz (position delta in EE frame)
      - dtheta (wrist rotation)
      - gripper: 0=close, 1=open
    """

    metadata = {"render_modes": ["human"], "render_fps": 30}

    def __init__(self, render_mode=None, gui=True):
        super().__init__()
        self.render_mode = render_mode
        self.gui = gui

        # Observation: 12-dim
        self.observation_space = spaces.Box(
            low=-np.inf, high=np.inf, shape=(12,), dtype=np.float32)
        # Action: 5-dim
        self.action_space = spaces.Box(
            low=-1.0, high=1.0, shape=(5,), dtype=np.float32)

        # Physics client
        self.client_id = p.connect(p.GUI if gui else p.DIRECT)
        p.setAdditionalSearchPath(pybullet_data.getDataPath())
        p.setGravity(0, 0, -9.81, physicsClientId=self.client_id)

        # Simulation parameters
        self.dt = 0.02          # 50 Hz
        self.max_steps = 200    # 4-second episodes
        self.reach_threshold = 0.04   # grasp success distance (m)
        self.lift_height = 0.10       # object must rise this high

        self.robot_id = None
        self.object_id = None
        self.table_id = None
        self.step_count = 0

    # ─── Helpers ──────────────────────────────────────────────

    def _setup_scene(self):
        """Create table, object, and arm."""
        # Table
        self.table_id = p.loadURDF("plane.urdf", physicsClientId=self.client_id)
        table_visual = p.createVisualShape(
            p.GEOM_BOX, halfExtents=[0.3, 0.3, 0.01],
            rgbaColor=[0.6, 0.5, 0.4, 1], physicsClientId=self.client_id)
        p.createMultiBody(
            baseMass=0, baseVisualShapeIndex=table_visual,
            basePosition=[0.4, 0, 0.005], physicsClientId=self.client_id)

        # Cube object
        obj_collision = p.createCollisionShape(
            p.GEOM_BOX, halfExtents=[0.025]*3)
        obj_visual = p.createVisualShape(
            p.GEOM_BOX, halfExtents=[0.025]*3,
            rgbaColor=[0.8, 0.2, 0.2, 1])
        self.object_id = p.createMultiBody(
            baseMass=0.01, baseCollisionShapeIndex=obj_collision,
            baseVisualShapeIndex=obj_visual,
            basePosition=[0.4, 0.0, 0.03],
            physicsClientId=self.client_id)

        # Simplified arm: 3 pitch joints + 1 gripper
        # We'll use a kinematic chain: base → shoulder → elbow → wrist
        # Each joint is controlled by setting joint position target
        arm_positions = [[0, 0, 0], [0, 0, 0.05], [0, 0, 0.25]]
        self.joint_limits = [(0.0, 0.3), (-1.0, 1.0), (-1.5, 0.0), (-1.0, 1.0)]

        # (In full implementation: load UR5 URDF from MoveIt or PyBullet URDF)
        # Here we simulate a 4-joint planar arm
        ee_pos = np.array([0.3, 0.0, 0.35])
        self.ee_pos = ee_pos.copy()
        self.ee_quat = [0, 0, 0, 1]  # wxyz
        self.gripper_open = 1.0

    def _reset_robot(self):
        """Randomize object and reset arm to home."""
        # Randomize object position on table
        obj_x = np.random.uniform(0.30, 0.50)
        obj_y = np.random.uniform(-0.15, 0.15)
        p.resetBasePositionAndOrientation(
            self.object_id, [obj_x, obj_y, 0.03], [0, 0, 0, 1],
            physicsClientId=self.client_id)

        # Reset arm to home
        self.ee_pos = np.array([0.3, 0.0, 0.35])
        self.gripper_open = 1.0

    # ─── Gymnasium Interface ───────────────────────────────────

    def reset(self, seed=None, options=None):
        super().reset(seed=seed)
        self._setup_scene()
        self._reset_robot()
        self.step_count = 0
        return self._get_obs(), {}

    def step(self, action):
        self.step_count += 1

        # ── Parse action ──
        dx = float(action[0]) * 0.05    # ±5 cm per step
        dy = float(action[1]) * 0.05
        dz = float(action[2]) * 0.03    # ±3 cm
        dtheta = float(action[3]) * 0.2  # ±0.2 rad
        gripper_cmd = float(action[4])   # 0=close, 1=open

        # Update gripper
        self.gripper_open = np.clip(self.gripper_open + (gripper_cmd - 0.5) * 0.3,
                                     0.0, 1.0)

        # ── Move EE (simplified kinematic model) ──
        # Transform delta from EE frame to world frame
        quat = np.array(self.ee_quat)
        # Approximate: assume EE quaternion ≈ [0,0,0,1] (upright gripper)
        self.ee_pos += np.array([dx, dy, dz])
        # Clamp to workspace
        self.ee_pos[0] = np.clip(self.ee_pos[0], 0.15, 0.65)
        self.ee_pos[1] = np.clip(self.ee_pos[1], -0.35, 0.35)
        self.ee_pos[2] = np.clip(self.ee_pos[2], 0.01, 0.60)

        # ── Step physics ──
        p.stepSimulation(physicsClientId=self.client_id)
        if self.render_mode == "human":
            time.sleep(self.dt)

        # ── Compute reward ──
        obj_pos, obj_quat = p.getBasePositionAndOrientation(
            self.object_id, physicsClientId=self.client_id)
        obj_pos = np.array(obj_pos)
        obj_vel = np.array(p.getBaseVelocity(
            self.object_id, physicsClientId=self.client_id)[0])

        dist_ee_obj = np.linalg.norm(self.ee_pos - obj_pos)
        obj_lifted = obj_pos[2] > self.lift_height

        # Grasp: EE close to object AND gripper closed
        is_grasping = (dist_ee_obj < self.reach_threshold) and (self.gripper_open < 0.3)

        # Reward components
        reward = -0.01                                   # step penalty
        reward += 0.05 * max(0, 0.10 - dist_ee_obj)   # approach bonus
        if is_grasping and obj_lifted:
            reward += 10.0                              # SUCCESS
        elif is_grasping:
            reward += 1.0                               # firm grasp
        if obj_pos[2] < 0.01:
            reward -= 2.0                               # object fell off table

        # Termination
        terminated = bool(obj_lifted and is_grasping)
        truncated = self.step_count >= self.max_steps

        return self._get_obs(), reward, terminated, truncated, {}

    def _get_obs(self) -> np.ndarray:
        obj_pos, _ = p.getBasePositionAndOrientation(
            self.object_id, physicsClientId=self.client_id)
        obj_pos = np.array(obj_pos)
        obj_vel = np.array(p.getBaseVelocity(
            self.object_id, physicsClientId=self.client_id)[0])

        rel_pos = obj_pos - self.ee_pos
        obs = np.concatenate([
            self.ee_pos,           # 3
            np.zeros(3),           # EE velocity (unknown → zero)
            np.array([self.gripper_open, 0.0]),  # gripper state
            rel_pos,               # 3
            [obj_vel[2]],          # object upward velocity
        ], dtype=np.float32)
        return np.clip(obs, -10, 10)

    def render(self):
        pass  # handled by PyBullet GUI


# ─── Training ─────────────────────────────────────────────────

def train_ppo(total_timesteps: int = 200_000, save_path: str = "ppo_grasping"):
    """Train a PPO agent on the grasping environment."""
    env = GraspingEnv(gui=False)

    model = PPO(
        policy="MlpPolicy",
        env=env,
        learning_rate=3e-4,
        n_steps=2048,
        batch_size=64,
        n_epochs=10,
        gamma=0.99,
        gae_lambda=0.95,
        clip_range=0.2,
        ent_coef=0.01,          # encourage exploration
        verbose=1,
        tensorboard_log="./tb_logs_grasping/",
    )

    print(f"[INFO] Training PPO for {total_timesteps} timesteps...")
    model.learn(total_timesteps=total_timesteps, progress_bar=True)
    model.save(save_path)
    print(f"[INFO] Model saved to {save_path}")
    return model


def evaluate(model_path: str = "ppo_grasping", episodes: int = 20):
    """Evaluate a trained model."""
    env = GraspingEnv(gui=True)
    model = PPO.load(model_path)

    successes = 0
    for ep in range(episodes):
        obs, _ = env.reset()
        done = False
        steps = 0

        while not done:
            action, _ = model.predict(obs, deterministic=True)
            obs, reward, terminated, truncated, _ = env.step(action)
            done = terminated or truncated
            steps += 1

        success = terminated   # episode ends on success
        successes += int(success)
        print(f"  Episode {ep+1}: {'✓ SUCCESS' if success else '✗ FAIL'} in {steps} steps")

    print(f"\nSuccess rate: {successes}/{episodes} = {successes/episodes*100:.1f}%")


if __name__ == "__main__":
    import sys
    if len(sys.argv) > 1 and sys.argv[1] == "eval":
        evaluate()
    else:
        train_ppo()

6. Step-by-Step Implementation Guide¶

Phase 1 — Robot Setup (All Tiers)¶

Connect the arm: Flash firmware on UR5/Panda, verify roscontrol connectivity
Load URDF: rosparam load robot_description → verify in RViz
Home position: Define a safe home configuration for all joints
Workspace calibration: Identify reachable workspace bounds

# Load UR5 URDF with MoveIt setup assistant
roslaunch ur5_moveit_config demo.launch

Phase 2 — Tier 1: Numerical IK¶

Define DH parameters or load the URDF with kdl_kinematics_plugin
Implement geometric_jacobian() using the URDF chain
Run numerical_ik() from home position → random reachable targets
Validate by comparing IK solution → FK forward pose
Integrate with MoveIt: move_group.set_joint_value_target(q_solution)

# Use MoveIt's built-in IK (IKFast or KDL)
from moveit_commander import MoveGroupCommander
mgc = MoveGroupCommander("manipulator")
mgc.set_pose_target(pose_target)   # geometry_msgs/Pose
plan = mgc.plan()
mgc.execute(plan, wait=True)

Phase 3 — Tier 2: GraspNet Pipeline¶

Point cloud capture: Subscribe to /camera/depth/points (ROS) or RealSense SDK
Downsample: VoxelGrid filter (leaf size 2–5 mm)
Pass through: Crop to workspace volume
Estimate normals: open3d.estimate_normals()
Sample candidates: Run sample_antipodal_points()
Score: Run CNN inference on batch
Execute: Take top-1 grasp → run IK → move arm → close gripper
Loop: If failed, retry with next-ranked grasp candidate

Phase 4 — Tier 3: RL Training¶

Start PyBullet simulation: Verify object spawning and arm dynamics
Baseline RL: Train PPO for 200k timesteps (~10 min on CPU)
Hyperparameter sweep: Tune ent_coef, gamma, n_steps
Domain randomization: Vary object size, mass, friction, camera noise
Sim-to-real: Transfer policy to real robot; fine-tune with online data

7. Comparison of Approaches¶

Criteria	Numerical IK (Tier 1)	GraspNet DL (Tier 2)	RL Grasping (Tier 3)
Grasp pose	Given (user specifies)	Predicted from PC	Learned end-to-end
Object perception	Known / pre-scene	RGB-D / point cloud	Raw sensors
Training data	None	1B synthetic grasps	Trial-and-error
Generalization	Deterministic	Strong (CNN features)	Needs sim-to-real
Computation	~100 Hz	~10 Hz (GPU batch)	~50 Hz (inference)
Unknown objects	❌ No	✅ Yes	✅ Yes (after training)
Occlusion handling	❌ No	⚠️ Limited	⚠️ Limited
Setup complexity	Low	Medium (GPU + dataset)	High (sim + training)
Best for	Pick-and-place	Cluttered scenes	Novel objects, complex dynamics
Difficulty	⭐	⭐⭐⭐	⭐⭐⭐⭐

8. Extensions and Variations¶

8.1 Multi-Object Grasping¶

Extend GraspNet to output grasp sequences (sorted by reachability)
Use a scene graph to reason about object relationships
Combine with task planning for "pick X and place in Y"

8.2 Impedance Control¶

After closing the gripper, switch to impedance mode to adapt to contact forces
Track desired wrench: \(\tau = \mathbf{J}^T \mathbf{F}_d + \mathbf{K}_p \mathbf{e} + \mathbf{K}_d \dot{\mathbf{e}}\)

8.3 Dexterous Manipulation¶

Replace parallel-jaw with Allegro / Shadow Hand
Use contact model (Morse friction cones) for grasp stability analysis
Extend to in-hand manipulation: reorient object without re-grasping

8.4 Contact Model¶

Coulomb friction (planar contact):

\[ F_t \leq \mu F_n \]

where \(F_t\) is tangential force, \(F_n\) is normal force, \(\mu\) is friction coefficient.

Grasp matrix for 2-finger grasp with \(k\) contact points:

\[ \mathbf{G} = \begin{bmatrix} \mathbf{I}_3 & \mathbf{I}_3 \\ [\mathbf{r}_1]_\times & [\mathbf{r}_2]_\times \end{bmatrix} \in \mathbb{R}^{6 \times 2k} \]

8.5 Sim-to-Real Transfer¶

Issue	Sim Solution	Real Adjustment
Camera noise	Add Gaussian noise to depth	Retune with real data
Dynamics mismatch	Match URDF inertials	System identification
Gripper offset	Calibrate with checkerboard	Online calibration
Object properties	Randomize friction, mass	Estimate from F/T sensor

9. References¶

Siciliano et al., 2008 — Robotics: Modelling, Planning and Control — Comprehensive robotics textbook covering IK, dynamics, and control
Spong et al., 2006 — Robot Modeling and Control — DH parameters, Jacobians, and numerical IK
GraspNet-1B Dataset & Paper — 1 billion grasp poses, 6-DoF grasp detection benchmark
Zhou et al., 2022 — GraspNet-1B: A Large-Scale Benchmark for General Object Grasping — 6-DoF grasp pose prediction from point cloud
Fang et al., 2020 — Closing the Loop for Robotic Grasping: A Large Scale Learning Based Approach — End-to-end grasp scoring CNN
Kalakrishnan et al., 2011 — Learning Force Control Policies for Compliant Manipulation — RL for force-based manipulation
Schulman et al., 2017 — Proximal Policy Optimization Algorithms — PPO algorithm
Coumans & Bai, 2016–2023 — PyBullet Physics Engine — Robot simulation
Sucan & Chitta, 2012 — MoveIt! ROS Integration for Planning Groups — Motion planning framework
Choi & Christensen, 2020 — Real-time Grasp Planning Using Multi-View SilhouetteNet — Multi-view grasp planning
Open3D Documentation — Point cloud processing library
YCB Object Set — Benchmark objects for manipulation research