Temporal Difference Learning¶

TD learning methods: TD(0), SARSA, Q-learning, expected SARSA, n-step TD, TD(λ) with eligibility traces, and their convergence properties.

Learning Objectives¶

1. From DP to Model-Free Learning¶

2. TD(0) Prediction¶

2.1 The TD Update Rule¶

2.2 TD vs. Monte Carlo¶

3. SARSA (On-Policy Control)¶

3.1 Algorithm¶

3.2 Python Implementation: Cliff Walking¶

4. Q-Learning (Off-Policy Control)¶

4.1 Algorithm¶

4.2 Python Implementation¶

4.3 SARSA vs. Q-Learning Comparison¶

5. Expected SARSA¶

6. Multi-Step TD Methods¶

6.1 n-Step Returns¶

6.2 n-Step SARSA¶

7. TD(λ) and Eligibility Traces¶

7.1 Forward View¶

7.2 Backward View with Eligibility Traces¶

8. Convergence Properties¶

Exercises¶

References¶

Temporal Difference Learning¶

Learning Objectives¶

1. From DP to Model-Free Learning¶

2. TD(0) Prediction¶

2.1 The TD Update Rule¶

2.2 TD vs. Monte Carlo¶

3. SARSA (On-Policy Control)¶

3.1 Algorithm¶

3.2 Python Implementation: Cliff Walking¶

4. Q-Learning (Off-Policy Control)¶

4.1 Algorithm¶

4.2 Python Implementation¶

4.3 SARSA vs. Q-Learning Comparison¶

5. Expected SARSA¶

6. Multi-Step TD Methods¶

6.1 n-Step Returns¶

6.2 n-Step SARSA¶

7. TD(λ) and Eligibility Traces¶

7.1 Forward View¶

7.2 Backward View with Eligibility Traces¶

8. Convergence Properties¶

Exercises¶

References¶

Robotics Course Docs

Learn

Build

Community