Some thoughts of a Machine Learning Practitioner on Software Development, Management, Team Building, Startups, Python, Agile Development, Data visualization... that will distract you from your end goals by making you less efficient but are critical to manage in order to succeed.
Don't forget that long time adaptation to inefficient approaches can become your enemy. Let's try to empower others by sharing knowledge & personal experiences.

Wednesday, September 2, 2009

Machine Learning and Data-Mining are extremely related but it isn't clear for most people. I'll try to clarify the link in this short blog.

Let's start with definitions:

Data-Mining (DM) is the process of extracting patterns from data. The main goal is to understand relationships, validate models or identify unexpected relationships.

Machine Learing (ML) algorithms allows computer to learn from data. The learning process consist of extracting the patterns but the end goal is to use the knowledge to do prediction on new data.

Both, in ML and DM, we start by extracting patterns. In DM, the process ends there by looking a the patterns. In ML, we reuse learned patterns to do prediction.

One important difference about patterns extraction is that machine learning algorithms don't need to understand the representation of the patterns but data-miners do. As an example, it is hard to understand exactly what a neural network has learned but decisions tree are easy to understand and compare. On the other hand, comprehensive patterns allows machine learning practitionner to identify data problems and by fixing them, improve the prediction accurary of their model.

So basically, the data-mined patterns learned by any machine learning algos are used to do prediction on new data.

Some people might simply say that they are the same, the only difference is how you use the learned patterns: to understand or to predict.

note:

Unsupervised learning can be considered has data-mining because it doesn't involve prediction. In order to understand discovered clusters difference, we can simply use supervised learning on discovered patterns tagged datasets.