Tuesday, February 24, 2009

Dear readers, after more than two years of blogging using Blogger, Data Mining Research will skip to Wordpress (self hosted). Indeed, Blogger is a good choice to start a blog but it has several limitations. A self hosted blog using Wordpress overcome such limitations. There will thus be a few days where DMR will be down. Thanks a lot for your patience. In a few days, DMR will be back on www.dataminingblog.com !

Wednesday, February 18, 2009

As a data miner, or someone interested in data mining, you certainly use different resources to network with people in the field. Some of them may be websites or social networks, etc. In the poll below, you can indicate which networks or websites you are using as a data miner. This is a multiple choice poll:

Tuesday, February 10, 2009

After nearly one year working for a family office in Nyon (Switzerland), I have decided to change my job. Applying data mining techniques to the financial market is extremely exciting. However, the working condition were really bad in the company and the turnover to high to work reasonably on mid or long term projects.Starting March 1st, I will be a consultant in business intelligence for FinScore, a consulting company based in Renens (close to Lausanne). I will thus work on data mining and business intelligence projects for external companies. FinScore is mostly involved in information quality management, customer intelligence and business intelligence. Clients are based in Switzerland and Europe. I'm really looking forward to starting in March!

Friday, February 06, 2009

After the LinkedIn and Facebook phenomena, here comes Twitter. When the short process of account creation is done, you can start writing small "posts" of maximum 140 characters. Twitter is a kind of micro-blogging platform. It is somehow between the blog and the chat. You can discuss a topic, and people can answer to your tweets. If you like someone's topics, you can follow him. You will then discover all the people he follows and the ones following him. You can of course then connect with all these people.At this stage you may ask yourself two questions. First question: why would you use Twitter? From a professional point of view, you can build a network of people in your field. You will also be aware of what happens in a particular domain. This is much more dynamic than a forum and it spreads faster than a newsletter.

Second question: what can you write in less than 140 characters? You can share an interesting article, highlight a website, point to a blog post, mention a nice book you have just read, some news you want to discuss about and many other things. In fact I use Twitter when I want to share something that would not deserve a whole post on my blog.

If you want to give Twitter a try, here is a list of data miners on Twitter. I have also mentioned their blog or company when available:

Also, don't forget that you can see the following and followers of everyone. Here is a nice blog for more tips on Twitter. If you are a data miner and if you use Twitter, feel free to add a comment to this post and mention your user name.

Monday, February 02, 2009

When my family and my friends ask me what I plan to work on in the future, they are always surprised by my answer: A different field of application. People usually wonder why I want to change. Is there something you don’t like in your current field of application? Not at all, but I would like to discover new fields, new domains that I don’t know yet.While passionate about the field of data mining, the application may be completely different from one job to another. I have started to apply data mining in the field of meteorology during my master thesis. I have then skipped to civil engineering for my PhD. I am now applying data mining in finance. Although I am of course not an expert in these domains, I have had the opportunity to discover them and learn a lot.

What about tomorrow? It may well be biology, web analytics or customer relationship management. Who knows? Although I don't know yet about these fields, I may be working in one of them in the near future. That is what I call the Beauty of Data Mining. Being a data miner is one of the rare jobs were you can discover and learn new domains during your whole professional career. Take a few seconds to think about it, and you will certainly agree on the chance we have to be data miners.

Wednesday, January 28, 2009

Most of you certainly know the DataScience Analytics blog, a place where great posts about data mining and statistics are written by John Aitchison. I recently wanted to contact him by e-mail and got an answer from his wife Tricia Aitchison that John suddenly died in 2008.

In the name of the readers of his blog, I would first like to give my condolences to Tricia. I had very good time chatting with John by e-mail and through his blog. I think he was very happy to share his knowledge in the field and give his personal opinion about various subjects.Regarding his blog, I remember he started to look at his PageRank and was asking, through his blog, why it was sometime going down so strangely. He then decided to stop blogging since he believed nobody was reading his posts. But he was wrong and had several readers. He finally came back to the blogosphere and decided to blog just because he liked it. I think it is a very nice reason to blog. You can find his last post named Finding your way around R - reprise.

Sunday, January 25, 2009

Data Mining Research proposes you to advertise you company, website or blog through text link ads and banners. In both cases, your ad will be visible on the right sidebar of each page (post) of the blog.

Data Mining Research (www.dataminingblog.com) started in June 2006, as a blog covering several aspects of data mining, both from the research and industry point of view. Data Mining Research contains examples of data mining applications, discussions on open issues, opinions on research articles, reviews of books, interview with leading data miners, data mining polls, etc.Data Mining Research is listed on several websites and blogs in the data mining community. For example, it is listed on the famous KDnuggets, MineThatData and Smart Data Collective websites.

Data Mining Research Traffic and Audience

The following statistics have been obtained using Google Analytics and Feedburner. They summarize Data Mining Research activity for the month of January 2009 (from December 25th 2008 to January 24th 2009):

Current RSS subscribers: 517

Number of visits: 2139

Number of pageviews: 3415

Average number of page viewed: 1.60 pages

Average time on site: 1:50 minutes

Advertising on Data Mining Research offers you the opportunity to reach a wide audience of practitioners and researchers in the following fields: data mining, machine learning, predictive analytics, knowledge discovery in databases (KDD), data analysis, statistics, business intelligence and decision support.

Below are a set of graphs that give information about Data Mining Research traffic (click to enlarge).

Wednesday, January 21, 2009

After reading the interesting post of Ajay, I decided to write a post about the good aspects of R. First, I would like to state that I'm not a SAS nor a Clementine user. So the following arguments are my opinions as a R programmer:

R is easy and free to improve: R contains hundreds of useful packages (data mining, finance, etc.). If this is not enough, you can program your own packages and share them with others. You are not dependent on some programmers.

R is a white-box: Since R is a programming language, it is easy to understand the overall process of the system in development. There is no GUI that allows you to put black-box components that may be unclear.

When you know R, you know everything: Ok, this is a bit too much. But the message is that it is much more easier to start with R and then move to SAS or Clementine than the opposite. Especially for users who only use the GUI.

R is free: This is very good since small companies don't have the money to buy SAS or Clementine. Also, if several users need such tools, then the price increase. Of course, in a large company, SAS and SPSS tools may be an alternative.

R is a good choice: R is as convenient as Matlab (or even more?) and as cheap as Java (which means free). Which makes R an excellent choice among existing tools and programming languages.