No no no. He may have a nifty algorithm, but there are only a small, finite number of Waldo pictures. Consciously or not, he’s memorized them. It’s like when you’ve played too much Trivial Pursuit - somebody says, “In 1066…” and you shout out, “William the Conqueror!”

What happens when you create a new picture and stick Waldo in the top left corner?

There are only a limited number of Where’s Waldo puzzles. It’s very easy to generate an algorithm that “solves” the puzzle, but which in essence is just memorizing where they are on those several-dozen images.

The fact that Waldo never appears in the bottom right is simply random noise – and that’s precisely what an overfitted model will appear to represent as a “pattern” rather than “random chance.”

Reducing the variables in the model to general, not specific, qualities and trying to see how many match will help avoid the problem of overfitting. For example, “not around the edges” is a much more general quality, and may well represent a real pattern in the data.

This “path,” by the way, is even sillier, now that I look at the method. It’s simply every single possible Waldo location, and then the approximate shortest line between all points. “Start here” doesn’t actually represent the optimal place to start – the path would be identical in all ways if you followed it backwards.