concept reinforcement ★★ seed

Policy Gradient Methods

RL algorithms that directly optimize the policy by gradient ascent on expected reward. REINFORCE (Williams, 1992) and actor-critic methods are foundational variants.

#policy-gradient #reinforce #actor-critic