But how would Google have performed? Given the presumption, that Google did not intentionally change their results to achieve good at the following test, we can check out what ranking Google would have given to the Jeopardy questions. The J! Archive is a good source to find the questions. All Ranks regarding the “jeopardy” Question are ignored in this little test with use of the “-jeopardy” tag.

Those are the first 10 Questions that Watson answered correctly on the 16th February game (Click on the Link to see Googles result):

As you can seenow totally spammed by Jeopardy regarding articles but once possible to see the correct answer is always found on the top ranks of Google’s results; if not in the headline, at least in the introducing texts of the results. The interpretation and announcing of the result is, of course, still to handle (text mining, evaluation, create voice output and so on).

Since the questions have been shown in text, a “Speech recognition” might have been obsolete, a OCR and an interface to the Google API might have produced some enjoyable results, too.

The new start-up company “triangulate” from Palo Alto, CA, wants to “empower people to use their own data to make the world better”. To make this honourable aim possible they are distributing their first software “Wings“, a Facebook application. This tool creates user-profiles with the help of data mining methods, and matches them to find fitting partners.

The software is not working with a form, in which the users have to put their interests, but sources the traces users left in the web. It gains information not only from Facebook, it also uses others sources like for example Last.fm or netflix, to find out the users music or video taste. Twitter is used to determine the topics a user may be interested in.

Sunil Nagaraj, CEO of Triangulate, reports on significant conclusions they made while developing the software. For instance, the density of a users’ social network determines his or her preference of a partner. The more the network is closed, the less a random encounter of a new person will lead to a partnership. In this case it is more likely that the next partner will be from the closed network.

Wings at Facebook

However the high volatility of the users’ characteristics are difficult to handle. Some people change their music taste every day – that may a relevant characteristic as well, but surely less important then a special music genre that is preferred and shared with the dream-partner.

The young start-up got 750.000$ (562.000 €) from investors to enter the very important dating-market. It has to compete against big players like match.com or eHarmony, which are using mighty dating data mining solutions for many years by now.

An important measure for a dating-recommendation engine is the visual nature of the potential partner. This aspect is very important for many partner seekers, and the data is mostly reduced to some body-measures that are more or less honestly given. To analyse and model a data mining solution that is capable to find out whether a persons photo creates sympathy or not is not trivial at all. Another dating-data-mining project called okcupid.com uses straight forwarded data mining algorithms, too, and focuses the visual nature in detail.

Data Mining experts are a rare species in a companies business habitat, in contrary to the rather common analyst. That is why more and more data mining tools arise that try to avoid experts and give smaller or even more complicate reports into the hands of the business analyst. Big vendor SAS now offers a new product that fits into that scheme; the SAS Rapid Predictive Modeler.

SAS stressed that this tool indeed focusses on the business analyst, and says it will enable even subject-matter experts with limited statistical expertise to build reliable and robust data mining models. This eventually leading to useful and descriptive reports and graphs.

SAS® Rapid Predictive Modeler Screenshot

It works with the visual interface of either SAS Enterprise Guide or Microsoft Excel, with the SAS Add-In for Microsoft Office.

The user is guided automatically through the data preparation and all the necessary data mining tasks.

SAS stated that statisticians may “generate quick, baseline models when they are short on time and resources”. This actually says that if you have a bit more time and resources, it may end up in better-than-baseline results. When using the big brother of the tool: the Rapid Predictive Modeler is a part of SAS Enterprise Miner 6.2, which must be licensed anyway.

Regarding to the license costs, companies probably should spend some extra money and hire a good data miner, too. But a common business analysts’ result definitely is a good benchmark that has to be beaten by the common data mining experts…

Every year Gartner selects the “Cool Vendor” – that is a cool small vendor that offers new innovative products or services. This years’ starlet is Data Mining Software KNIME (short for Konstanz Information Miner), a comprehensive Open-Source data integration, processing, analysis, and exploration platform.

KNIME has been selected in the key technology areas Analytics, Business Intelligence, and Performance Management.

The “Text 2.0” project offers a framework that makes it possible to track the movement of a reader’s eye and optimizes the presentation of the text that is read. It will be even possible to integrate this features in websites – an eye-tracking hardware is all that will be needed (may be normal in the future like the webcam already is).

Here three of the main features which are very impressive:

The reading behavior can be to get a quick overview of the text. That is detected by “Text 2.0”, and “filling words” of the text will be faded, which lefts only important keywords to the reader. He now can get his overview done much quicker; his work of finding the keywords is eased by the software.

If the reader sticks to some certain words, this is a hind that he doesn’t know what they mean. This is detected, and a tiny pop-up above the words shows the meaning – from Wikipedia or a dictionary. Or the word is automatically translated, if it is not written in the reader’s mother language. The pronunciation of the word can be spoken as well.

Pictures can be shown, when a certain and corresponding passage of the text is read. Usually it breaks the stream of reading if the reader looks at a picture that does not perfectly fit to what he was reading the second before (also including commercials). That can be perfectly overwhelmed with the tool.

This fascinating idea really reminds one of the interactive books in Neil Stephenson’s “Diamond Age”.

The company Apteco offers FastStats, a software tool for address selection for running campaigns. The .Net tool is equipped with multiple plugins to solve certain tasks.
It works fine with text files that get sorted and linked. A huge deal for the developers seems to be to provide users with an easy drag’n’drop environment. And this was quite well done. The selection and handling of data is simple and efficient. The most important part for us is the module “FastStats Modelling” – since this is where the Data Mining takes place.

Apteco implemented the patented “Predictive Weight of Evidence (PWE)” procedure and decision trees methods. In this module the drag’n’drop handling was a big issue as well – the application is handled easily and will be no problem for our collogues of the marketing department. In fact, if it is for instance needed to select a certain customer target group that should be detected by using a trained model, that task is fulfilled in a short time. The complete data can be shown in detail, or if preferred, it can be presented simplified.
The success of the model can be measured monetary. It is possible to integrate the expected conversion rate, the revenue per gained customer and the costs of the campaign into the analysis. That helps to judge the outcomes and to find out which model fits best.
This tool provides a good first step into Data Mining for marketing campaigns; hence it only has few methods to apply. The segmentation of customers and the optimizing of campaigns for special target groups, combined with the option to find product-affinities of customers makes “FastStats Modelling” a useful tool for the offered price. The usability of the drag’n’drop user interface is a outstanding way that will hopefully be adapted by other vendors.

Cloud Mining with Oracles ODM is available on the Amazon Cloud since end of February 2010, as shown at Oracles Website. There is a pre-installed Oracle 11gR2 Database and sample datasets ready to use. Trying the Oracle 11gR2 Data Mining Amazon Machine Image (AMI), users can now launch an Oracle Cloud [...]

Cloud Mining with Oracles ODM is available on the Amazon Cloud since end of February 2010, as shown at Oracles Website. There is a pre-installed Oracle 11gR2 Database and sample datasets ready to use. Trying the Oracle 11gR2 Data Mining Amazon Machine Image (AMI), users can now launch an Oracle Cloud Mining enabled instance directly through Amazon Web Services (AWS). There will be normal costs according standard Amazon EC2 charges.

That definitely lets Oracle win the race to the Cloud against the Data Mining Experts from SAS.

Data Applied, one of the worlds first Cloud Mining providers, adds new capabilities to its data mining and data visualization suite. The new data transformation feature complements the company’s existing data visualization, data mining and reporting features.

Using a step-by-step wizard, users can define transformation steps allowing them to process rows of data and create new data sets. Metadata transformation steps include creating, renaming, converting, and deleting fields. Row transformation steps include filtering, sampling, ranking, and scrambling rows. Fields can also be set to calculated values by referencing other fields or by invoking built-in mathematical, statistical, or text functions.

In addition, the company announced other features including geo-mapping and view sharing. The new geo-mapping feature allows widgets such as pie charts or bar charts to be mapped to geographical locations, while the new view sharing feature allows users to securely share and embed visualizations in any web page. For more information, visit www.data-applied.com.

RapidMiner is a well known Open-Source Data Mining Tool from company Rapid-I, and is in use many thousand times all over the world. At CeBIT I had the opportunity to talk to Co-Founder Ralf Klinkenberg about his software and get some interesting information, for example if RapidMiner is ready for Cloud Mining.

RapidMiner auf der CeBIT 2010

RapidMiner, formerly known as YALE, has been developed at the German university of Dortmund, beginning in 2001. Since then it has definitely proved its impressive functionality, I for myself used it the first time for a Data Mining contest in 2006 (being quite successful). Meanwhile it is hosted at the open source developing platform sourceforge and is also developed further on this site. Right now the 5th version is available.

Rapid-I provides Enterprise Editions

Out of the necessity from many companies to reduce the common open source risks, and to provide a business partner that can give support, the company Rapid-I GmbH was founded by the developers of YALE. Here CEO Dr. Ingo Mierswa, CBDO Ralf Klinkenberg and their co-workers distribute three different Enterprise Editions of RapidMiner; the SMALL, STANDART and the DEVELOPER Edition. These certified versions of the open source product supply customers with the needed liability to run it in the companies IT-infrastructure. For these editions the roll-out of the 5th version already took place, too.

Data Mining with RapidMiner

RapidMiner is a complete Data Mining suite. That means, it provides all steps of the KDD process from the interface to the database and the ETL to the analytics and the reporting tool. The tool supports more then dizzying 500 Data Mining methods due to its open source roots. And it has proven reliability in many tests, for example at BARC as the best open source tool. An intuitive and modern graphical user interface gives the experienced data mining expert the opportunity to solve nearly all problems he has to face in practice scenarios. The software works with virtual repositories, so that the data can technically lie anywhere. Meta-data can be accessed at every step of the development. A remarkable feature is the real-time validation. In the design process partial results can be obtained, so the usual trial-and-error approach is noticeable simplified. You find some good video tutorials at the RapidMiner website.

More than Data Mining

RapidMiner has won lots of functionality over the years of development. The clear-cut analytical tool YALE from the past gave way to a modern enterprise tool, that has its own derived solutions for several typical hands-on problems:

RapidSentilyser: market insight (how often and in which context a companies name is mentioned in the media?)

RapidNet: explorer to discover connections between components in a network

The BuzzBoard in RapidSentilyser

The BuzzBoard is a dashboard for RapidSentilyser allows to track real-rime feedback of publicitiy measures of companies. I saw an impressive example of this clearly arranged results; finance news of the last years were compared to news of the last week – all focused on one multinational company. It was highlighted, if the company was mentioned in “positive” or “negative” coherences. That made it easy to find out if the last publicity activity paid its due with the help of a single measure figure. That is only possible with a high-end Data Mining basement, of course.

Cloud Mining with RapidMiner

The client-server architecture of RapidAnalytics makes it possible to put the repository anywhere, also in the Cloud. That means RapidMiner is in principle ready for the use in the Cloud. But the core advantage of a Cloud Mining solution, the parallelization of the algorithms and the scoring engine, has not been explicit focused. That makes RapidMiner with RapidAnalytics best capable in conventional big company infrastructures.

At the CeBIT 2010 I visited the booth of the SAS Institute. It again was a rich exchange of information, and finally I got my very own SAS mug! My question, if SAS is doing anything in the direction of Cloud Mining, was forwarded by my conversational partner to the press [...]

At the CeBIT 2010 I visited the booth of the SAS Institute. It again was a rich exchange of information, and finally I got my very own SAS mug! My question, if SAS is doing anything in the direction of Cloud Mining, was forwarded by my conversational partner to the press department. But then the partner made a quick comment, that made me listen attentivel: “… but yes, SAS has a Private Cloud on the roadmap”. For this reason they are building a huge data processing center, showed my inquiry. I guess they will enhance the SAS Enterprise Miner, to make it capable for Cloud Mining. When or how my conversational partner was unable or unwilling to say, I will wait for the answer of the press department. But I think it is quite interesting, that the world leader in Data Mining is not oversleeping the buzz around the Cloud.