Lab / RL Playground
🧠

RL Playground

Train RL agents on grid worlds. Compare Q-Learning, SARSA, Expected SARSA, and Monte Carlo. Visualize Q-value heatmaps and policy arrows.

Off-policy TD: updates Q toward max future Q regardless of action taken.

Controls

Parameters

0.1
0.99
1
0.995
200

Statistics

Episodes0
Total Steps0
Avg Reward (100)0
Success Rate0%
Current Epsilon1.0000
ConvergenceLearning...

Legend

Start (S)
Goal (G)
Wall
Hole (H)
Slippery (~)

All algorithms from scratch. Q-tables as Float64Arrays. 7 preset grid worlds including Frozen Lake and Cliff Walking.