Datasets & Benchmarks Reference
This chapter provides a comprehensive reference of datasets, benchmarks, and simulators used across all robot task categories. Use this as a lookup when selecting evaluation data for your research.
3D Scene Datasets (Navigation)
Dataset
Year
Scenes
Modality
Annotations
Used By
Matterport3D
2017
90 buildings
RGB-D panoramic
40 semantic categories
R2R VLN, Habitat
ScanNet
2017
1,513 scenes
RGB-D
Semantic + instance seg
3D detection, reconstruction
HM3D
2021
1,000 buildings
3D mesh
Semantic (v0.2)
Habitat ObjectNav
HSSD-200
2024
211 scenes
3D mesh
Interactive objects
Habitat rearrangement
Gibson
2018
572 buildings
3D reconstruction
—
Gibson simulator
3DSSG
2020
478 scenes
RGB-D
Scene graph labels
Scene understanding
ProcTHOR
2022
10,000 rooms
Procedural
Object layouts
Scalable training
ARKitScenes
2021
1,000+ rooms
iPhone LiDAR
3D bounding boxes
Real-world SLAM
Matterport3D Details
Matterport3D:
├── 10,800 panoramic RGB-D images
├── 90 buildings (houses, apartments)
├── 40 semantic categories (wall, floor, chair, table, ...)
├── 3D mesh reconstruction per building
├── Used for:
│ ├── R2R Vision-Language Navigation (21K instructions)
│ ├── Active Vision tasks
│ ├── Semantic SLAM
│ └── PointNav / ObjectNav benchmarks
└── License: Research use only
HM3D Details
HM3D (Habitat-Matterport 3D Dataset):
├── 1,000 building-scale 3D scenes
├── Largest dataset of its kind
├── Semantic annotations (v0.2): furniture, fixtures, objects
├── Used by Habitat Challenge 2022–2024
├── Covers: ObjectNav, PointNav, Exploration
└── License: Research use (via Habitat)
Navigation Datasets (VLN)
Dataset
Year
Instructions
Scenes
Language
Task
R2R
2018
21,567
MP3D
English
VLN
RxR
2020
126,000+
MP3D
EN/HI/TE
Multilingual VLN
REVERIE
2020
21,702
MP3D
English
Remote referring
SOON
2021
4,000+
MP3D
English
Environmental variation
CVDN
2020
7,441 dialogs
MP3D
English
Dialog-based
ALFRED
2020
25,743
AI2-THOR
English
VLN + manipulation
TEACh
2022
3,500+
AI2-THOR
English
Dialog + task execution
SCAND
2024
Real-world
Indoor
English
Social navigation
SLAM Datasets
Indoor SLAM
Dataset
Year
Sensor
Sequences
Ground Truth
Key Use
TUM RGB-D
2012
Kinect
39
Motion capture
RGB-D SLAM standard
EuRoC MAV
2016
Stereo + IMU
11
Motion capture
Visual-inertial SLAM
ICL-NUIM
2014
Synthetic
8
Perfect
Simulated benchmark
TartanAir
2020
Synthetic
30+
Perfect
Diverse environments
Replica
2019
Synthetic
18
Perfect
Neural SLAM
ScanNet
2017
RGB-D
1513
ICP-aligned
Semantic SLAM
Outdoor SLAM
Dataset
Year
Sensor
Sequences
Ground Truth
Key Use
KITTI
2012
Stereo + LiDAR
22+11
GPS/RTK
Visual/LiDAR odometry
nuScenes
2019
LiDAR + cameras
1,000
GPS/IMU
3D detection + tracking
Waymo Open
2019
LiDAR + cameras
1,150
GPS/IMU
3D detection
Oxford RobotCar
2016
Multi-sensor
100+
GPS
Long-term SLAM
MulRan
2020
LiDAR
12
GPS
Multi-session SLAM
Hilti
2022
Multi-sensor
9
Total station
Construction SLAM
NCLT
2016
Multi-sensor
27
GPS
Long-term localization
TUM RGB-D Sequences
Most commonly used TUM RGB-D sequences:
fr1_xyz — Simple translation (30s, small workspace)
fr1_desk — Desktop with objects (23s)
fr1_floor — Floor-level scan (28s)
fr1_room — Full room traversal (66s)
fr2_xyz — Larger workspace (122s)
fr2_desk — Office desk (99s)
fr2_360 — 360° rotation (28s)
fr2_rpy — Roll/pitch/yaw motion (29s)
fr3_office — Full office (30s)
fr3_nstr — Noisy texture (27s)
Evaluation: use 'associate.py' to align timestamps
Manipulation Object Datasets
Dataset
Year
Objects
Modality
Key Feature
YCB
2015
77 (5 categories)
3D models + physical
Industry standard
DexYCB
2021
10 YCB objects
RGB-D + hand tracking
Dexterous grasping
Google Scanned Objects
2020
1,031
3D scans
Simulation assets
OmniObject3D
2023
6,000+
3D scans + textures
Largest 3D object set
ObjectNet
2019
313 classes
RGB
Robustness testing
ACID
2022
1,000+
3D models
Articulated objects
OakInk
2022
1,800+
RGB-D + hand
Hand-object interaction
YCB Object Categories
YCB Benchmark (77 objects in 5 categories):
1. Food Items (20 objects)
├── Canned goods (tomato soup, tuna, etc.)
├── Fresh produce (apple, banana, peach, pear, strawberry)
└── Packaged items (cracker box, sugar box, etc.)
2. Kitchen Items (17 objects)
├── Utensils (spatula, spoon, fork, knife)
├── Containers (mug, bowl, plate, cup)
└── Appliances (pitcher, mug, etc.)
3. Tool Items (19 objects)
├── Hand tools (wrench, pliers, screwdriver, hammer)
├── Measuring (tape measure)
└── Clamps and supports
4. Shape & Size Items (16 objects)
├── Blocks (various sizes)
├── Spheres (various sizes)
├── Cylinders (various sizes)
└── Standardized shapes for calibration
5. Task Items (5 objects)
└── Multi-part assemblies for manipulation tasks
Manipulation Benchmarks
Benchmark
Year
Tasks
Robot
Setting
Key Feature
RLBench
2020
100
Franka Panda
Simulation
Diverse, language-conditioned
Meta-World
2019
50
Sawyer
Simulation
Multi-task RL
LIBERO
2024
130
Franka Panda
Simulation
4 difficulty suites
RoboSuite
2021
8+
Multiple
Simulation
Modular
Calvin
2022
34
Franka
Simulation
Long-horizon
BEHAVIOR-1K
2023
1000
Various
OmniGibson
Full household
RoboCasa
2024
100+
Mobile manip
Simulation
Kitchen tasks
ManiSkill2
2023
20
Various
SAPIEN
Articulated objects
SAPIEN
2020
Various
—
Simulation
Part-level articulation
Cross-Embodiment Datasets
Dataset
Year
Trajectories
Robots
Tasks
Key Feature
Open X-Embodiment
2024
1M+
22
500+
Cross-embodiment
Bridge V2
2023
60K
WidowX
10+
Real robot
RoboSet
2023
100K+
Franka
11
Multi-task
RoboTurk
2018
2,152
Sawyer
6
Crowdsourced
DROID
2024
350K
Various
—
In-the-wild
Open X-Embodiment
Open X-Embodiment (Google DeepMind, 2024):
├── 1M+ trajectories from 22 robot embodiments
├── 500+ distinct tasks
├── Standardized format (RLDS)
├── Trained RT-X models:
│ ├── RT-1-X: Improved 50% on unseen robots
│ └── RT-2-X: Better generalization
├── Contribution from 21 institutions
└── Paper: Brohan et al., ICRA 2024
Simulators
Simulator
Developer
Physics
Rendering
Primary Use
GPU Accel
Habitat
Meta
Bullet
Custom
Navigation
✅
AI2-THOR
Allen AI
Unity
Unity
Nav + Manip
✅
MuJoCo
DeepMind
MuJoCo
EGL
Manipulation
✅
Isaac Sim
NVIDIA
PhysX 5
RTX
All tasks
✅
SAPIEN
UCSD
PhysX
Custom
Articulated objects
✅
RoboSuite
Stanford
MuJoCo
EGL
Manipulation
✅
Gazebo
OSRF
ODE/Bullet
OGRE
ROS integration
❌
PyBullet
—
Bullet
OpenGL
Quick prototyping
❌
CARLA
Intel
Unreal
Unreal
Driving/Outdoor
✅
OmniGibson
Stanford
PhysX
Omniverse
Full household
✅
Simulator Selection Guide
What simulator should I use?
Navigation tasks → Habitat (fastest, largest scenes)
└── Need interaction? → AI2-THOR / iGibson
Manipulation tasks → MuJoCo (fast, accurate)
└── Need articulated objects? → SAPIEN
└── Need GPU parallelism? → Isaac Sim / Isaac Gym
ROS integration → Gazebo (native ROS support)
Quick prototyping → PyBullet (easy to install, fast)
Full household → OmniGibson / BEHAVIOR-1K
Outdoor driving → CARLA
Multi-task research → Isaac Sim (most versatile)
Selection Flowchart
Start: What task are you evaluating?
│
├── Navigation
│ ├── PointNav → HM3D + Habitat
│ ├── ObjectNav → HM3D + Habitat
│ ├── VLN → Matterport3D + R2R/RxR
│ ├── Exploration → HM3D or custom
│ └── Social → SCAND + Habitat 3.0
│
├── SLAM
│ ├── Visual SLAM (indoor) → TUM RGB-D, EuRoC
│ ├── Visual SLAM (outdoor) → KITTI
│ ├── LiDAR SLAM → KITTI, MulRan, Hilti
│ └── Neural SLAM → Replica, ScanNet, TartanAir
│
├── Manipulation
│ ├── Pick-and-place → RLBench, LIBERO
│ ├── Dexterous → DexYCB, Adroit
│ ├── Assembly → Peg-in-hole, RLBench
│ ├── Deformable → SoftGym
│ └── Mobile manipulation → BEHAVIOR-1K, RoboCasa
│
└── Cross-task
├── Foundation model → Open X-Embodiment
└── Multi-task → BEHAVIOR-1K, RoboBench
References
Chang et al. (2017). "Matterport3D: Learning from RGB-D Data in Indoor Environments." 3DV 2017
Dai et al. (2017). "ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes." CVPR 2017
Ramakrishnan et al. (2021). "Habitat-Matterport 3D Dataset (HM3D)." 3DV 2021
Anderson et al. (2018). "Vision-and-Language Navigation." CVPR 2018
Calli et al. (2015). "The YCB Object and Model Set." ICRA 2015
Chao et al. (2021). "DexYCB: A Benchmark for Capturing Hand Grasping of Objects." CVPR 2021
Brohan et al. (2024). "Open X-Embodiment: Robotic Learning Datasets and RT-X Models." ICRA 2024
Sturm et al. (2012). "A Benchmark for the Evaluation of RGB-D SLAM Systems." IROS 2012
Geiger et al. (2012). "Are we ready for autonomous driving? The KITTI vision benchmark suite." CVPR 2012
James et al. (2020). "RLBench: The Robot Learning Benchmark." IEEE RA-L
Yu et al. (2020). "SAPIEN: A SimulAted Parted Interactive ENvironment." CVPR 2020