跳转至

离线强化学习

无需环境交互,从固定数据集中学习:保守 Q 学习、隐式 Q 学习、决策 Transformer 和机器人离线 RL。

Learning Objectives

1. Why Offline RL?

1.1 Online vs. Offline

1.2 The Distribution Shift Problem

2. Challenges of Offline RL

2.1 OOD Actions

2.2 Overestimation

3. Conservative Q-Learning (CQL)

3.1 Penalizing OOD Actions

3.2 Algorithm

4. Implicit Q-Learning (IQL)

4.1 Expectile Regression

5. Decision Transformers

5.1 RL as Sequence Modeling

5.2 Architecture

6. Offline RL for Robotics

6.1 Robot Air Hockey

6.2 D4RL Benchmark

7. Comparison: Offline vs. Online vs. IL

Exercises

References