This figure is supposed to motivate the use of L2 regularization. The text points out that "the model with L2 regularization (dots) has become much more resistant to overfitting than the reference model (crosses), even though both models have the same number of parameters."

That's true, but it's also true that the original model has the best score, by quite a bit, on epoch 3, which confuses the point and begs the question, "Explain to me again why L2 is better?" It does not seem like a good example.