Skip to content

Offline Reinforcement Learning

Learning from fixed datasets without environment interaction: Conservative Q-Learning, Implicit Q-Learning, Decision Transformers, and offline RL for robotics.

Learning Objectives

1. Why Offline RL?

1.1 Online vs. Offline

1.2 The Distribution Shift Problem

2. Challenges of Offline RL

2.1 OOD Actions

2.2 Overestimation

3. Conservative Q-Learning (CQL)

3.1 Penalizing OOD Actions

3.2 Algorithm

4. Implicit Q-Learning (IQL)

4.1 Expectile Regression

5. Decision Transformers

5.1 RL as Sequence Modeling

5.2 Architecture

6. Offline RL for Robotics

6.1 Robot Air Hockey

6.2 D4RL Benchmark

7. Comparison: Offline vs. Online vs. IL

Exercises

References