离线强化学习
无需环境交互,从固定数据集中学习:保守 Q 学习、隐式 Q 学习、决策 Transformer 和机器人离线 RL。
Learning Objectives
1. Why Offline RL?
1.1 Online vs. Offline
1.2 The Distribution Shift Problem
2. Challenges of Offline RL
2.1 OOD Actions
2.2 Overestimation
3. Conservative Q-Learning (CQL)
3.1 Penalizing OOD Actions
3.2 Algorithm
4. Implicit Q-Learning (IQL)
4.1 Expectile Regression
5.1 RL as Sequence Modeling
5.2 Architecture
6. Offline RL for Robotics
6.1 Robot Air Hockey
6.2 D4RL Benchmark
7. Comparison: Offline vs. Online vs. IL
Exercises
References