domain reinforcement seed

Reinforcement Learning

Learning through interaction with an environment to maximize cumulative reward. From Q-learning to AlphaGo to RLHF.

#reinforcement-learning #reward #policy #agent