Reinforcement learning is an area of machine learning concerned with how software agents ought to take actions in an environment so as to maximize some notion of cumulative reward. It allows machines and software agents to automatically determine the ideal behavior within a specific context, in order to maximize its performance. Simple reward feedback is required for the agent to learn its behavior; this is known as the reinforcement signal.

​

Reinforcement learning is defined by a specific type of problem, and all its solutions are classed as Reinforcement learning algorithms. In the problem, an agent is supposed decide the best action to select based on his current state. The environment is typically formulated as a Markov decision process (MDP), as many reinforcement learning algorithms for this context use dynamic programming techniques. The main difference between the classical techniques and reinforcement learning algorithms is that the latter do not need knowledge about the MDP and they target large MDPs where exact methods become impractical.
The problem has been studied in the theory of optimal control, still most studies are concerned with the existence of optimal solutions and their characterization, and not with the learning or approximation aspects. In economics and game theory, reinforcement learning may be utilized to analyze how equilibrium may arise under bounded rationality.

​

The simplest context in which to think about reinforcement learning is in games with a clear objective and a point system.

For example, a game where a mouse is looking for the cheese at the end of the maze (+500 points), or the lesser reward of water along the way (+10 points). Meanwhile, mouse tries to avoid electric shock (-100 points).

​

The reward is not always immediate. Here, the robot-mouse will go to a long stretch of the maze. It has to walk through the paths and face several decision points before reaching the cheese.

​

​

​

​

​

The agent observes the environment, takes an action to interact with the environment, and receives positive or negative reward.

​

​

​

​

​

​

With the advance of neural networks, deep reinforcement learning, a strategy that uses neural networks to evaluate the states (e.g. Q-values), becomes more popular. It allows researchers and engineers to create agents that does well in more complex enviroments.

​

Due to its generality, it is studied in many other disciplines, such as game theory, control theory, operations research, information theory, simulation-based optimization, multi-agent systems, swarm intelligence, statistics and genetic algorithms. In the operations research and control literature.