Skip to content

Robot Manipulation

Manipulation is the ability to interact with and modify the physical world — grasping objects, opening doors, assembling parts, and using tools. It is arguably the most challenging domain in robotics due to the complexity of contact physics, the high dimensionality of the action space, and the need for precise control.

1. Pick-and-Place

Task Definition

Grasp an object from one location and place it at another. The simplest form of manipulation, but still an open research problem for arbitrary objects.

Formal Specification

Input:  Object pose (or detection from camera), target pose
Output: Grasp pose → approach trajectory → grasp → lift → place
Metric: Success Rate (% of successful placements), completion time

Why It's Hard

Challenges in pick-and-place:
├── Object diversity — Varying shapes, sizes, weights, textures
├── Grasp planning — Where to grasp? What orientation?
├── Slip detection — Is the object securely held?
├── Occlusion — Camera may not see the object clearly
├── Collision avoidance — Avoid hitting other objects
└── Precision — Must place within tolerance (cm-level)

Key Benchmarks

Benchmark Year Tasks Robot Key Feature
RLBench 2020 100 Franka Panda Diverse manipulation tasks
Meta-World 2019 50 Sawyer arm Multi-task RL benchmark
LIBERO 2024 130 Franka Panda 4 difficulty suites
RoboSuite 2021 8+ Multiple arms Modular framework
Calvin 2022 34 Franka Panda Long-horizon, language-conditioned

Typical Pipeline

RGB-D Camera → Object Detection (YOLO/SAM) → 6-DoF Pose Estimation
→ Grasp Planning (GraspNet / analytic) → Motion Planning (RRT/PRM)
→ Execution (impedance control) → Verification (did it succeed?)

Where It Applies

  • Warehouse logistics: Pack items into boxes
  • Manufacturing: Place components on assembly lines
  • Household: Clear the table, load the dishwasher

Pick-and-Place Demo

Robot Arm Pick-and-Place Animation


2. Assembly

Task Definition

Fit multiple parts together to form a larger structure. Requires precise positioning, insertion, and often force-controlled interactions.

Types of Assembly

Type Example Difficulty
Peg-in-hole Insert a peg into a hole Classic benchmark
Gear assembly Mesh gears together Tight tolerances
Furniture assembly Build IKEA furniture Long-horizon, language
Electronics assembly Place PCB components Micro-precision

Key Benchmarks

  • Peg-in-hole: Classical robotics benchmark, tolerance < 0.1mm
  • Furniture Assembly: Nocker et al. (2023), requires language understanding + precise manipulation
  • RoboSet Assembly: Real-world assembly tasks with demonstrations

Force Control in Assembly

# Impedance control for assembly tasks
class ImpedanceController:
    """
    Impedance control: makes the robot behave like a spring-damper system.
    Useful for assembly where contact forces must be regulated.
    """

    def __init__(self, stiffness, damping):
        self.K = stiffness    # Spring constant (N/m)
        self.D = damping      # Damping coefficient (N·s/m)

    def compute_force(self, x_current, x_desired, v_current, v_desired):
        """
        F = K * (x_desired - x_current) + D * (v_desired - v_current)

        When the robot pushes against a wall:
        - x_current can't move further → force increases
        - The force is regulated by K and D
        """
        position_error = x_desired - x_current
        velocity_error = v_desired - v_current
        force = self.K * position_error + self.D * velocity_error
        return force

Where It Applies

  • Manufacturing: Automated assembly lines
  • Construction: Prefabricated building assembly
  • Space: In-orbit satellite assembly

3. Dexterous Manipulation

Task Definition

Manipulate objects using multi-fingered hands with dexterity comparable to human hands. This includes in-hand reorientation, precision grasping, and fine motor skills.

Why Dexterous?

Comparison of grippers:

Parallel Gripper (2 fingers):
├── Simple, reliable
├── Limited grasp types (pinch only)
└── Cannot reorient objects in hand

Dexterous Hand (5+ fingers):
├── Can grasp any shape
├── Can reorient objects in-hand
├── Can use tools
└── Much harder to control (20+ DoF)

Key Benchmarks

Benchmark Year Hand Tasks Key Feature
DexYCB 2021 Human hand Grasping 582K hand-object frames
Adroit 2018 Shadow Hand Pen manipulation RL benchmark
DexArt 2023 Allegro Hand Articulated objects Real-world dexterous
TACTILE 2024 Various Contact-rich Tactile sensing

Datasets

  • DexYCB (CVPR 2021): 582K RGB-D frames of human hand grasping 10 YCB objects, 10 subjects
  • GRAB (Sigal et al., 2021): Full-body grasping with contact
  • OakInk (Yang et al., 2022): 1,800+ objects, hand-object interaction

State-of-the-Art Methods

Dexterous manipulation approaches:

1. RL + Simulation (most popular)
   Train in Isaac Gym / MuJoCo with domain randomization
   Example: DAPG (Rajeswaran et al., 2018)

2. Teleoperation → Imitation
   Human demonstrates with glove → robot learns from demos
   Example: DexYCB, T-AIR

3. Tactile-Guided Control
   Use tactile sensors (GelSight, DIGIT) for contact feedback
   Example: TACTO, FingerVision

4. Foundation Models for Manipulation
   Use pretrained vision models to guide manipulation
   Example: RT-2, Octo

Where It Applies

  • Prosthetics: Dexterous artificial hands
  • Manufacturing: Handling small, irregular parts
  • Service: Manipulating everyday objects (bottles, tools, food)

4. Deformable Object Manipulation

Task Definition

Manipulate objects that change shape during interaction — cloth, rope, food, cables, soft tissues. Unlike rigid objects, deformable objects have infinite-dimensional state spaces.

Types

Object Example Task Difficulty
Cloth Fold a shirt High — draping, folding
Rope Tie a knot High — topological changes
Cable Route a cable Medium — routing, avoiding tangles
Food Slice vegetables Medium — cutting, portioning
Soft tissue Surgical manipulation Very high — precision, safety

Key Challenges

Deformable manipulation challenges:
├── State representation — How to represent cloth/rope state?
├── Simulation — Physics of deformable bodies is expensive
├── Perception — Tracking deformable objects is hard
├── Planning — Infinite-dimensional configuration space
└── Sim-to-real — Deformable sim doesn't transfer well

Key Benchmarks

  • SoftGym (Lin et al., 2020): Cloth/fluid manipulation in simulation
  • ROBOSURF (Qi et al., 2022): Cable routing benchmark
  • Gym-Fluid (various): Fluid pouring and manipulation

Where It Applies

  • Laundry: Folding clothes
  • Agriculture: Harvesting soft fruits
  • Surgery: Robotic-assisted minimally invasive surgery

5. Tool Use

Task Definition

Use external tools to extend the robot's capabilities — hammers, screwdrivers, spatulas, scissors. Requires understanding tool affordances and how to grasp and wield them.

Examples

Tool use tasks:
├── Hammer — Drive a nail
├── Screwdriver — Turn a screw
├── Spatula — Flip a pancake
├── Scissors — Cut paper
├── Wrench — Tighten a bolt
└── Brush — Paint a surface

Key Datasets

  • Something-Something V2 (Goyal et al., 2017): 220K videos of object interactions including tool use
  • EPIC-KITCHENS (Damen et al., 2018): Egocentric kitchen activities with tool manipulation
  • RoboSet (Bharadhwaj et al., 2023): 11 tasks including tool use (cut, wipe, sweep)

Where It Applies

  • Manufacturing: Using power tools autonomously
  • Kitchen: Cooking with utensils
  • Maintenance: Using diagnostic equipment

6. Mobile Manipulation

Task Definition

Combine navigation and manipulation — the robot must move to a location AND manipulate objects there. This is the most realistic and challenging task category.

Why It Matters

Real-world tasks are almost always mobile manipulation:
  "Make coffee" = Navigate to kitchen + pick up mug + 
                  operate coffee machine + carry mug back

  "Clean the room" = Navigate to mess + pick up items + 
                     navigate to trash/shelf + place items

Key Benchmarks

Benchmark Year Setting Tasks Key Feature
Habitat 2.0 2021 Simulation Rearrangement Physics-based
TEACh 2022 AI2-THOR Dialog-based tasks Language + manipulation
BEHAVIOR-1K 2023 OmniGibson 1000 activities Full household tasks
RoboCasa 2024 Simulation Kitchen tasks Large-scale kitchen sim

Where It Applies

  • Home assistants: General-purpose household robots
  • Elder care: Help with daily activities
  • Logistics: Pick items from shelves and deliver them

Manipulation Task Comparison

Task Key Challenge DoF Required Simulation Real-World Gap
Pick-and-Place Grasp planning 6–7 Good Small
Assembly Precision, force control 6–7 + force Moderate Medium
Dexterous High-DoF control 20+ Moderate Large
Deformable State representation 6+ Poor Very large
Tool Use Affordance understanding 6–7 Good Medium
Mobile Manip Navigation + manipulation 6 + base Moderate Large

Object Datasets for Manipulation

Dataset Objects Modality Key Feature
YCB 77 (5 categories) 3D models + physical Industry standard
DexYCB 10 YCB objects RGB-D + hand tracking Dexterous grasping
Google Scanned Objects 1,031 3D scans Simulation assets
ObjectNet 313 classes, 50K images RGB Robustness testing
ACID 1,000+ 3D models Articulated objects
OmniObject3D 6,000+ 3D scans + textures Largest 3D object set

References

  • Tobin et al. (2017). "Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World." IROS 2017
  • Rajeswaran et al. (2018). "Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations." RSS 2018
  • James et al. (2020). "RLBench: The Robot Learning Benchmark & Learning Environment." IEEE RA-L
  • Mees et al. (2022). "CALVIN: A Benchmark for Language-Conditioned Policy Learning." ICRA 2022
  • Bharadhwaj et al. (2023). "RoboSet: A Multi-Task Dataset for Robot Learning." ICRA 2023
  • Zhao et al. (2024). "RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots." RSS 2024
  • Brohan et al. (2024). "Open X-Embodiment: Robotic Learning Datasets and RT-X Models." ICRA 2024