The fourth experiment should get a significantly higher reward than the first experiment. In the first experiment the agent will continue exploring 20% of the time throughout the experiment. However, in the fourth experiment the agent will begin by exploring but will gradually increase the time it spends exploiting. It achieves a higher reward by making a better trade off between exploration and exploitation.