What if the knowledge and data we have are not sufficient to completely determine the correct classifier? Then we run the risk of just hallucinating a classifier (…)

(…) strong false assumptions can be better than weak true ones, because a learner with the latter needs more data to avoid overfitting.

Even with a moderate dimension of 100 and a huge training set of a trillion examples, the latter cover only a fraction of about 10^-18 of the input space. This is what makes machine learning both necessary and hard.

(…) the most useful learners are those that facilitate incorporating knowledge.

Previously, Domingos has done a lot of interesting work on, for instance, why Naïve Bayes often works well even though its assumptions are not fulfilled, and why bagging works well. Those are just the ones I remember, I’m sure there is a lot more.