Skip to content

Datasets & Benchmarks Reference

This chapter provides a comprehensive reference of datasets, benchmarks, and simulators used across all robot task categories. Use this as a lookup when selecting evaluation data for your research.

3D Scene Datasets (Navigation)

Dataset Year Scenes Modality Annotations Used By
Matterport3D 2017 90 buildings RGB-D panoramic 40 semantic categories R2R VLN, Habitat
ScanNet 2017 1,513 scenes RGB-D Semantic + instance seg 3D detection, reconstruction
HM3D 2021 1,000 buildings 3D mesh Semantic (v0.2) Habitat ObjectNav
HSSD-200 2024 211 scenes 3D mesh Interactive objects Habitat rearrangement
Gibson 2018 572 buildings 3D reconstruction Gibson simulator
3DSSG 2020 478 scenes RGB-D Scene graph labels Scene understanding
ProcTHOR 2022 10,000 rooms Procedural Object layouts Scalable training
ARKitScenes 2021 1,000+ rooms iPhone LiDAR 3D bounding boxes Real-world SLAM

Matterport3D Details

Matterport3D:
├── 10,800 panoramic RGB-D images
├── 90 buildings (houses, apartments)
├── 40 semantic categories (wall, floor, chair, table, ...)
├── 3D mesh reconstruction per building
├── Used for:
│   ├── R2R Vision-Language Navigation (21K instructions)
│   ├── Active Vision tasks
│   ├── Semantic SLAM
│   └── PointNav / ObjectNav benchmarks
└── License: Research use only

HM3D Details

HM3D (Habitat-Matterport 3D Dataset):
├── 1,000 building-scale 3D scenes
├── Largest dataset of its kind
├── Semantic annotations (v0.2): furniture, fixtures, objects
├── Used by Habitat Challenge 2022–2024
├── Covers: ObjectNav, PointNav, Exploration
└── License: Research use (via Habitat)
Dataset Year Instructions Scenes Language Task
R2R 2018 21,567 MP3D English VLN
RxR 2020 126,000+ MP3D EN/HI/TE Multilingual VLN
REVERIE 2020 21,702 MP3D English Remote referring
SOON 2021 4,000+ MP3D English Environmental variation
CVDN 2020 7,441 dialogs MP3D English Dialog-based
ALFRED 2020 25,743 AI2-THOR English VLN + manipulation
TEACh 2022 3,500+ AI2-THOR English Dialog + task execution
SCAND 2024 Real-world Indoor English Social navigation

SLAM Datasets

Indoor SLAM

Dataset Year Sensor Sequences Ground Truth Key Use
TUM RGB-D 2012 Kinect 39 Motion capture RGB-D SLAM standard
EuRoC MAV 2016 Stereo + IMU 11 Motion capture Visual-inertial SLAM
ICL-NUIM 2014 Synthetic 8 Perfect Simulated benchmark
TartanAir 2020 Synthetic 30+ Perfect Diverse environments
Replica 2019 Synthetic 18 Perfect Neural SLAM
ScanNet 2017 RGB-D 1513 ICP-aligned Semantic SLAM

Outdoor SLAM

Dataset Year Sensor Sequences Ground Truth Key Use
KITTI 2012 Stereo + LiDAR 22+11 GPS/RTK Visual/LiDAR odometry
nuScenes 2019 LiDAR + cameras 1,000 GPS/IMU 3D detection + tracking
Waymo Open 2019 LiDAR + cameras 1,150 GPS/IMU 3D detection
Oxford RobotCar 2016 Multi-sensor 100+ GPS Long-term SLAM
MulRan 2020 LiDAR 12 GPS Multi-session SLAM
Hilti 2022 Multi-sensor 9 Total station Construction SLAM
NCLT 2016 Multi-sensor 27 GPS Long-term localization

TUM RGB-D Sequences

Most commonly used TUM RGB-D sequences:

fr1_xyz      — Simple translation (30s, small workspace)
fr1_desk     — Desktop with objects (23s)
fr1_floor    — Floor-level scan (28s)
fr1_room     — Full room traversal (66s)
fr2_xyz      — Larger workspace (122s)
fr2_desk     — Office desk (99s)
fr2_360      — 360° rotation (28s)
fr2_rpy      — Roll/pitch/yaw motion (29s)
fr3_office   — Full office (30s)
fr3_nstr     — Noisy texture (27s)

Evaluation: use 'associate.py' to align timestamps

Manipulation Object Datasets

Dataset Year Objects Modality Key Feature
YCB 2015 77 (5 categories) 3D models + physical Industry standard
DexYCB 2021 10 YCB objects RGB-D + hand tracking Dexterous grasping
Google Scanned Objects 2020 1,031 3D scans Simulation assets
OmniObject3D 2023 6,000+ 3D scans + textures Largest 3D object set
ObjectNet 2019 313 classes RGB Robustness testing
ACID 2022 1,000+ 3D models Articulated objects
OakInk 2022 1,800+ RGB-D + hand Hand-object interaction

YCB Object Categories

YCB Benchmark (77 objects in 5 categories):

1. Food Items (20 objects)
   ├── Canned goods (tomato soup, tuna, etc.)
   ├── Fresh produce (apple, banana, peach, pear, strawberry)
   └── Packaged items (cracker box, sugar box, etc.)

2. Kitchen Items (17 objects)
   ├── Utensils (spatula, spoon, fork, knife)
   ├── Containers (mug, bowl, plate, cup)
   └── Appliances (pitcher, mug, etc.)

3. Tool Items (19 objects)
   ├── Hand tools (wrench, pliers, screwdriver, hammer)
   ├── Measuring (tape measure)
   └── Clamps and supports

4. Shape & Size Items (16 objects)
   ├── Blocks (various sizes)
   ├── Spheres (various sizes)
   ├── Cylinders (various sizes)
   └── Standardized shapes for calibration

5. Task Items (5 objects)
   └── Multi-part assemblies for manipulation tasks

Manipulation Benchmarks

Benchmark Year Tasks Robot Setting Key Feature
RLBench 2020 100 Franka Panda Simulation Diverse, language-conditioned
Meta-World 2019 50 Sawyer Simulation Multi-task RL
LIBERO 2024 130 Franka Panda Simulation 4 difficulty suites
RoboSuite 2021 8+ Multiple Simulation Modular
Calvin 2022 34 Franka Simulation Long-horizon
BEHAVIOR-1K 2023 1000 Various OmniGibson Full household
RoboCasa 2024 100+ Mobile manip Simulation Kitchen tasks
ManiSkill2 2023 20 Various SAPIEN Articulated objects
SAPIEN 2020 Various Simulation Part-level articulation

Cross-Embodiment Datasets

Dataset Year Trajectories Robots Tasks Key Feature
Open X-Embodiment 2024 1M+ 22 500+ Cross-embodiment
Bridge V2 2023 60K WidowX 10+ Real robot
RoboSet 2023 100K+ Franka 11 Multi-task
RoboTurk 2018 2,152 Sawyer 6 Crowdsourced
DROID 2024 350K Various In-the-wild

Open X-Embodiment

Open X-Embodiment (Google DeepMind, 2024):
├── 1M+ trajectories from 22 robot embodiments
├── 500+ distinct tasks
├── Standardized format (RLDS)
├── Trained RT-X models:
│   ├── RT-1-X: Improved 50% on unseen robots
│   └── RT-2-X: Better generalization
├── Contribution from 21 institutions
└── Paper: Brohan et al., ICRA 2024

Simulators

Simulator Developer Physics Rendering Primary Use GPU Accel
Habitat Meta Bullet Custom Navigation
AI2-THOR Allen AI Unity Unity Nav + Manip
MuJoCo DeepMind MuJoCo EGL Manipulation
Isaac Sim NVIDIA PhysX 5 RTX All tasks
SAPIEN UCSD PhysX Custom Articulated objects
RoboSuite Stanford MuJoCo EGL Manipulation
Gazebo OSRF ODE/Bullet OGRE ROS integration
PyBullet Bullet OpenGL Quick prototyping
CARLA Intel Unreal Unreal Driving/Outdoor
OmniGibson Stanford PhysX Omniverse Full household

Simulator Selection Guide

What simulator should I use?

Navigation tasks → Habitat (fastest, largest scenes)
  └── Need interaction? → AI2-THOR / iGibson

Manipulation tasks → MuJoCo (fast, accurate)
  └── Need articulated objects? → SAPIEN
  └── Need GPU parallelism? → Isaac Sim / Isaac Gym

ROS integration → Gazebo (native ROS support)

Quick prototyping → PyBullet (easy to install, fast)

Full household → OmniGibson / BEHAVIOR-1K

Outdoor driving → CARLA

Multi-task research → Isaac Sim (most versatile)

Selection Flowchart

Start: What task are you evaluating?
├── Navigation
│   ├── PointNav → HM3D + Habitat
│   ├── ObjectNav → HM3D + Habitat
│   ├── VLN → Matterport3D + R2R/RxR
│   ├── Exploration → HM3D or custom
│   └── Social → SCAND + Habitat 3.0
├── SLAM
│   ├── Visual SLAM (indoor) → TUM RGB-D, EuRoC
│   ├── Visual SLAM (outdoor) → KITTI
│   ├── LiDAR SLAM → KITTI, MulRan, Hilti
│   └── Neural SLAM → Replica, ScanNet, TartanAir
├── Manipulation
│   ├── Pick-and-place → RLBench, LIBERO
│   ├── Dexterous → DexYCB, Adroit
│   ├── Assembly → Peg-in-hole, RLBench
│   ├── Deformable → SoftGym
│   └── Mobile manipulation → BEHAVIOR-1K, RoboCasa
└── Cross-task
    ├── Foundation model → Open X-Embodiment
    └── Multi-task → BEHAVIOR-1K, RoboBench

References

  • Chang et al. (2017). "Matterport3D: Learning from RGB-D Data in Indoor Environments." 3DV 2017
  • Dai et al. (2017). "ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes." CVPR 2017
  • Ramakrishnan et al. (2021). "Habitat-Matterport 3D Dataset (HM3D)." 3DV 2021
  • Anderson et al. (2018). "Vision-and-Language Navigation." CVPR 2018
  • Calli et al. (2015). "The YCB Object and Model Set." ICRA 2015
  • Chao et al. (2021). "DexYCB: A Benchmark for Capturing Hand Grasping of Objects." CVPR 2021
  • Brohan et al. (2024). "Open X-Embodiment: Robotic Learning Datasets and RT-X Models." ICRA 2024
  • Sturm et al. (2012). "A Benchmark for the Evaluation of RGB-D SLAM Systems." IROS 2012
  • Geiger et al. (2012). "Are we ready for autonomous driving? The KITTI vision benchmark suite." CVPR 2012
  • James et al. (2020). "RLBench: The Robot Learning Benchmark." IEEE RA-L
  • Yu et al. (2020). "SAPIEN: A SimulAted Parted Interactive ENvironment." CVPR 2020