“When building a statistical model, you ideally want to find yourself surprised by the data some of the time — just not too often. If you never come up with a result that surprises you, it generally means that you didn’t spend a lot of time actually looking at the data; instead, you just imparted your assumptions onto your analysis and engaged in a fancy form of confirmation bias. If you’re constantly surprised, on the other hand, more often than not that means your model is buggy or you don’t know the field well enough; a lot of the “surprises” are really just mistakes.”

This means that both the model and the data must be tested and validated in order to ensure good decision making.

The biggest risk for machine learning is that the mathematics boffins responsible for building the models may not understand this if they do not have basic data management skills.