Datasets & Benchmarks Reference¶

This chapter provides a comprehensive reference of datasets, benchmarks, and simulators used across all robot task categories. Use this as a lookup when selecting evaluation data for your research.

Dataset	Year	Scenes	Modality	Annotations	Used By
Matterport3D	2017	90 buildings	RGB-D panoramic	40 semantic categories	R2R VLN, Habitat
ScanNet	2017	1,513 scenes	RGB-D	Semantic + instance seg	3D detection, reconstruction
HM3D	2021	1,000 buildings	3D mesh	Semantic (v0.2)	Habitat ObjectNav
HSSD-200	2024	211 scenes	3D mesh	Interactive objects	Habitat rearrangement
Gibson	2018	572 buildings	3D reconstruction	—	Gibson simulator
3DSSG	2020	478 scenes	RGB-D	Scene graph labels	Scene understanding
ProcTHOR	2022	10,000 rooms	Procedural	Object layouts	Scalable training
ARKitScenes	2021	1,000+ rooms	iPhone LiDAR	3D bounding boxes	Real-world SLAM

Matterport3D Details¶

Matterport3D:
├── 10,800 panoramic RGB-D images
├── 90 buildings (houses, apartments)
├── 40 semantic categories (wall, floor, chair, table, ...)
├── 3D mesh reconstruction per building
├── Used for:
│   ├── R2R Vision-Language Navigation (21K instructions)
│   ├── Active Vision tasks
│   ├── Semantic SLAM
│   └── PointNav / ObjectNav benchmarks
└── License: Research use only

HM3D Details¶

HM3D (Habitat-Matterport 3D Dataset):
├── 1,000 building-scale 3D scenes
├── Largest dataset of its kind
├── Semantic annotations (v0.2): furniture, fixtures, objects
├── Used by Habitat Challenge 2022–2024
├── Covers: ObjectNav, PointNav, Exploration
└── License: Research use (via Habitat)

Dataset	Year	Instructions	Scenes	Language	Task
R2R	2018	21,567	MP3D	English	VLN
RxR	2020	126,000+	MP3D	EN/HI/TE	Multilingual VLN
REVERIE	2020	21,702	MP3D	English	Remote referring
SOON	2021	4,000+	MP3D	English	Environmental variation
CVDN	2020	7,441 dialogs	MP3D	English	Dialog-based
ALFRED	2020	25,743	AI2-THOR	English	VLN + manipulation
TEACh	2022	3,500+	AI2-THOR	English	Dialog + task execution
SCAND	2024	Real-world	Indoor	English	Social navigation

SLAM Datasets¶

Indoor SLAM¶

Dataset	Year	Sensor	Sequences	Ground Truth	Key Use
TUM RGB-D	2012	Kinect	39	Motion capture	RGB-D SLAM standard
EuRoC MAV	2016	Stereo + IMU	11	Motion capture	Visual-inertial SLAM
ICL-NUIM	2014	Synthetic	8	Perfect	Simulated benchmark
TartanAir	2020	Synthetic	30+	Perfect	Diverse environments
Replica	2019	Synthetic	18	Perfect	Neural SLAM
ScanNet	2017	RGB-D	1513	ICP-aligned	Semantic SLAM

Outdoor SLAM¶

Dataset	Year	Sensor	Sequences	Ground Truth	Key Use
KITTI	2012	Stereo + LiDAR	22+11	GPS/RTK	Visual/LiDAR odometry
nuScenes	2019	LiDAR + cameras	1,000	GPS/IMU	3D detection + tracking
Waymo Open	2019	LiDAR + cameras	1,150	GPS/IMU	3D detection
Oxford RobotCar	2016	Multi-sensor	100+	GPS	Long-term SLAM
MulRan	2020	LiDAR	12	GPS	Multi-session SLAM
Hilti	2022	Multi-sensor	9	Total station	Construction SLAM
NCLT	2016	Multi-sensor	27	GPS	Long-term localization

TUM RGB-D Sequences¶

Most commonly used TUM RGB-D sequences:

fr1_xyz      — Simple translation (30s, small workspace)
fr1_desk     — Desktop with objects (23s)
fr1_floor    — Floor-level scan (28s)
fr1_room     — Full room traversal (66s)
fr2_xyz      — Larger workspace (122s)
fr2_desk     — Office desk (99s)
fr2_360      — 360° rotation (28s)
fr2_rpy      — Roll/pitch/yaw motion (29s)
fr3_office   — Full office (30s)
fr3_nstr     — Noisy texture (27s)

Evaluation: use 'associate.py' to align timestamps

Manipulation Object Datasets¶

Dataset	Year	Objects	Modality	Key Feature
YCB	2015	77 (5 categories)	3D models + physical	Industry standard
DexYCB	2021	10 YCB objects	RGB-D + hand tracking	Dexterous grasping
Google Scanned Objects	2020	1,031	3D scans	Simulation assets
OmniObject3D	2023	6,000+	3D scans + textures	Largest 3D object set
ObjectNet	2019	313 classes	RGB	Robustness testing
ACID	2022	1,000+	3D models	Articulated objects
OakInk	2022	1,800+	RGB-D + hand	Hand-object interaction

YCB Object Categories¶

YCB Benchmark (77 objects in 5 categories):

1. Food Items (20 objects)
   ├── Canned goods (tomato soup, tuna, etc.)
   ├── Fresh produce (apple, banana, peach, pear, strawberry)
   └── Packaged items (cracker box, sugar box, etc.)

2. Kitchen Items (17 objects)
   ├── Utensils (spatula, spoon, fork, knife)
   ├── Containers (mug, bowl, plate, cup)
   └── Appliances (pitcher, mug, etc.)

3. Tool Items (19 objects)
   ├── Hand tools (wrench, pliers, screwdriver, hammer)
   ├── Measuring (tape measure)
   └── Clamps and supports

4. Shape & Size Items (16 objects)
   ├── Blocks (various sizes)
   ├── Spheres (various sizes)
   ├── Cylinders (various sizes)
   └── Standardized shapes for calibration

5. Task Items (5 objects)
   └── Multi-part assemblies for manipulation tasks

Manipulation Benchmarks¶

Benchmark	Year	Tasks	Robot	Setting	Key Feature
RLBench	2020	100	Franka Panda	Simulation	Diverse, language-conditioned
Meta-World	2019	50	Sawyer	Simulation	Multi-task RL
LIBERO	2024	130	Franka Panda	Simulation	4 difficulty suites
RoboSuite	2021	8+	Multiple	Simulation	Modular
Calvin	2022	34	Franka	Simulation	Long-horizon
BEHAVIOR-1K	2023	1000	Various	OmniGibson	Full household
RoboCasa	2024	100+	Mobile manip	Simulation	Kitchen tasks
ManiSkill2	2023	20	Various	SAPIEN	Articulated objects
SAPIEN	2020	Various	—	Simulation	Part-level articulation

Cross-Embodiment Datasets¶

Dataset	Year	Trajectories	Robots	Tasks	Key Feature
Open X-Embodiment	2024	1M+	22	500+	Cross-embodiment
Bridge V2	2023	60K	WidowX	10+	Real robot
RoboSet	2023	100K+	Franka	11	Multi-task
RoboTurk	2018	2,152	Sawyer	6	Crowdsourced
DROID	2024	350K	Various	—	In-the-wild

Open X-Embodiment¶

Open X-Embodiment (Google DeepMind, 2024):
├── 1M+ trajectories from 22 robot embodiments
├── 500+ distinct tasks
├── Standardized format (RLDS)
├── Trained RT-X models:
│   ├── RT-1-X: Improved 50% on unseen robots
│   └── RT-2-X: Better generalization
├── Contribution from 21 institutions
└── Paper: Brohan et al., ICRA 2024

Simulators¶

Simulator	Developer	Physics	Rendering	Primary Use	GPU Accel
Habitat	Meta	Bullet	Custom	Navigation	✅
AI2-THOR	Allen AI	Unity	Unity	Nav + Manip	✅
MuJoCo	DeepMind	MuJoCo	EGL	Manipulation	✅
Isaac Sim	NVIDIA	PhysX 5	RTX	All tasks	✅
SAPIEN	UCSD	PhysX	Custom	Articulated objects	✅
RoboSuite	Stanford	MuJoCo	EGL	Manipulation	✅
Gazebo	OSRF	ODE/Bullet	OGRE	ROS integration	❌
PyBullet	—	Bullet	OpenGL	Quick prototyping	❌
CARLA	Intel	Unreal	Unreal	Driving/Outdoor	✅
OmniGibson	Stanford	PhysX	Omniverse	Full household	✅

Simulator Selection Guide¶

What simulator should I use?

Navigation tasks → Habitat (fastest, largest scenes)
  └── Need interaction? → AI2-THOR / iGibson

Manipulation tasks → MuJoCo (fast, accurate)
  └── Need articulated objects? → SAPIEN
  └── Need GPU parallelism? → Isaac Sim / Isaac Gym

ROS integration → Gazebo (native ROS support)

Quick prototyping → PyBullet (easy to install, fast)

Full household → OmniGibson / BEHAVIOR-1K

Outdoor driving → CARLA

Multi-task research → Isaac Sim (most versatile)

Selection Flowchart¶

Start: What task are you evaluating?
│
├── Navigation
│   ├── PointNav → HM3D + Habitat
│   ├── ObjectNav → HM3D + Habitat
│   ├── VLN → Matterport3D + R2R/RxR
│   ├── Exploration → HM3D or custom
│   └── Social → SCAND + Habitat 3.0
│
├── SLAM
│   ├── Visual SLAM (indoor) → TUM RGB-D, EuRoC
│   ├── Visual SLAM (outdoor) → KITTI
│   ├── LiDAR SLAM → KITTI, MulRan, Hilti
│   └── Neural SLAM → Replica, ScanNet, TartanAir
│
├── Manipulation
│   ├── Pick-and-place → RLBench, LIBERO
│   ├── Dexterous → DexYCB, Adroit
│   ├── Assembly → Peg-in-hole, RLBench
│   ├── Deformable → SoftGym
│   └── Mobile manipulation → BEHAVIOR-1K, RoboCasa
│
└── Cross-task
    ├── Foundation model → Open X-Embodiment
    └── Multi-task → BEHAVIOR-1K, RoboBench

References¶

Chang et al. (2017). "Matterport3D: Learning from RGB-D Data in Indoor Environments." 3DV 2017
Dai et al. (2017). "ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes." CVPR 2017
Ramakrishnan et al. (2021). "Habitat-Matterport 3D Dataset (HM3D)." 3DV 2021
Anderson et al. (2018). "Vision-and-Language Navigation." CVPR 2018
Calli et al. (2015). "The YCB Object and Model Set." ICRA 2015
Chao et al. (2021). "DexYCB: A Benchmark for Capturing Hand Grasping of Objects." CVPR 2021
Brohan et al. (2024). "Open X-Embodiment: Robotic Learning Datasets and RT-X Models." ICRA 2024
Sturm et al. (2012). "A Benchmark for the Evaluation of RGB-D SLAM Systems." IROS 2012
Geiger et al. (2012). "Are we ready for autonomous driving? The KITTI vision benchmark suite." CVPR 2012
James et al. (2020). "RLBench: The Robot Learning Benchmark." IEEE RA-L
Yu et al. (2020). "SAPIEN: A SimulAted Parted Interactive ENvironment." CVPR 2020

Datasets & Benchmarks Reference¶

3D Scene Datasets (Navigation)¶

Matterport3D Details¶

HM3D Details¶

Navigation Datasets (VLN)¶

SLAM Datasets¶

Indoor SLAM¶

Outdoor SLAM¶

TUM RGB-D Sequences¶

Manipulation Object Datasets¶

YCB Object Categories¶

Manipulation Benchmarks¶

Cross-Embodiment Datasets¶

Open X-Embodiment¶

Simulators¶

Simulator Selection Guide¶

Selection Flowchart¶

References¶

Robotics Course Docs

Learn

Build

Community