The first experiment should get a significantly higher reward than the second experiment. Exploring near s2 can incur a large penalty. SARSA will adopt a policy which avoids exploring near dangerous areas. Since the agent is forced to explore 20% of the time, SARSA's strategy of avoiding the dangerous area results in a better total reward.