🧠
RL Playground
Train RL agents on grid worlds. Compare Q-Learning, SARSA, Expected SARSA, and Monte Carlo. Visualize Q-value heatmaps and policy arrows.
Off-policy TD: updates Q toward max future Q regardless of action taken.
Controls
Parameters
0.1
0.99
1
0.995
200
Statistics
Episodes0
Total Steps0
Avg Reward (100)0
Success Rate0%
Current Epsilon1.0000
ConvergenceLearning...
Legend
Start (S)
Goal (G)
Wall
Hole (H)
Slippery (~)
All algorithms from scratch. Q-tables as Float64Arrays. 7 preset grid worlds including Frozen Lake and Cliff Walking.