Friday, January 25, 2008

Wow! What a title! It sounds good. One can first think that ranking completely different data mining algorithms is useless (since they have their own uses and applications). However, certain people (and not the leasts) have selected the most important algorithms in the field. These algorithms have been identified during the IEEE ICDM in 2006. This selection is the subject of a paper (1) that has appeared online in December 2007 (Knowledge and Information Systems). In their paper, Wu and co-authors give a description of each algorithm as well as current and future research overviews. Here is the (unordered) list of the 10 chosen algorithms:

C4.5

K-means

SVM

A priori

EM

PageRank

AdaBoost

kNN

Naive Bayes

CART

If you want to know why these algorithms were chosen, have a look at the article. First, I was surprised to see PageRank in the list. Even if it is a famous algorithm, it is not known to be a data mining algorithm. However, authors show the links to social network and therefore data mining.

Most algorithms have been written by different authors. Thus, the style is very different throughout the article. The part on SVM is written by answering some specific questions and is therefore very interesting. The AdaBoost is written in a very exciting way (if you don't know it, you will want to know more about it). Finally, the CART part is the longest (a bit too long, to my opinion) description among the 10 algorithms. At the end, this paper is a good overview of state of the art algorithms in data mining.