Sunday, October 16, 2005

Many people I tell I work on Machine Learning, give a quick nod implying "I know what it is - no need to explain". Then, while the conversation continues, it becomes completely clear that they don't have a clue what I'm talking about, but now they feel a little embarrassed to admit it. When I delve a little further, it becomes apparent that they confused Machine Learning for E-Learning...So what is Machine Learning anyway? Machine Learning is the ability of a computer (program) to learn from experience (i.e. samples or training set) and to generalize its knowledge to handle unseen data. It is an area of Artificial Intelligence, and of course overlaps heavily with statistics. Some of the most commonly known algorithms are: Neural Networks, Bayes Classifiers, Genetic Algorithms, Decision Trees and many more. Machine Learning is very much concerned with complexity of the algorithms - the idea is to come up with an algorithm that performs sufficiently well to be realistic. Therefore, and since the overall best solution is almost always exponential, every Machine Learning algorithm has an Inductive Bias. This bias implies that the algorithm makes all kinds of assumptions about the data and/or the best solution, which limit the size of the search space.And what about Data Mining? Isn't that the same thing?Well, yes and no. The difference between the two is something not everybody agrees about. The most commonly used differentiation is that Data Mining uses Machine Learning algorithms for real-world applications, usually on very large databases. The most important difference, then, is that Data Mining includes process to clean and organize the data such that it would be suitable for the learning algorithm to work with. It may seem only a small difference, but bear in mind that usually this cleaning process takes up 50% of the resources for a project (time, money, etc.).