Given enough data state-of-the-art supervised learning algorithms can approximate behaviour very well. However, the ability to replicate behaviour is intelligence. Supervised learning algorithms currently lack the ability to understand. Inherent in understanding is questioning data and negotiating interpretations of ambiguous observations.

Modern machine learning techniques, such as deep learning, require massive amounts of data. Natural language processing is particularly challenging as language is spares. This means there are lots of rare occurrences that are bound to appear in the text you are processing. This post describes what you can do to handle the sparcity of NLP.