Policies and Value Functions

A reinforcement learning policy is a mapping that selects an action to take based
on observations from the environment. During training, the agent tunes the
parameters of its policy representation to maximize the long-term reward.

Reinforcement Learning Toolbox™ software provides objects for actor and critic representations. The
actor represents the policy that selects the best action to take. The critic
represents the value function that estimates the value of the current policy.
Depending on your application and selected agent, you can define policy and value
functions using deep neural networks, linear basis functions, or look-up tables. For
more information, see Create Policy and Value Function Representations.