Dataiku at the Hadoop Summit

Events|April 14, 2014|
Florian

On April 2-3 took place the Hadoop Summmit Europe, a two-day event about the Apache Hadoop community, in Amsterdam.
I gave a talk about “Semi-Supervised learning applied to understanding customer journeys” for Dataiku. Here is a recap.

The talk aimed to show new ways to monitor customer satisfaction on websites. It is quite challenging for websites with a large audience. Because the audience is large and diverse, you need to segment your visitors into groups. As an example, on a news website, you would find a mix of different behaviours: people coming almost each and everyday, new comers from Google News, visitors clicking on social network posts, crawl robots, people focusing on a particular comment, etc. You cannot look at average metrics, such as page views, for all these behaviors.
With semi-supervised learning on the logs of the website, one could build “clusters” of sessions that reflect different kinds of behaviour. Then, for each group, you could apply different metrics and predict user satisfaction.
Find out more in the slides of my talk.

I also attended other conferences and I give you my 3 takeaways of the Hadoop Summit :

- There’s a next-gen after “MapReduce” on Hadoop.
Hortonworks “Tez” project is the new horizon for Hadoop processing. Tez goal is to overcome the limitations of “MapReduce” in terms of handling lower latency jobs, and complex workflows.
One practical application is that Hive, the SQL for Hadoop, is bound to get increased performance and that it will require less expertise in order to get the best performance for SQL driven analytics.

- Predictive and Machine Learning on top of Hadoop is moving too.
Ted Dunning presented the strategic moves of the Mahout Project: providing a higher level abstraction and leveraging new efficient large scale machine learning projects such as H2O.
One practical application is to provide a more consistent accessible framework for managing 1TB of training data.

- European-wide.
There was a very well spread distribution of talks and participants, with use cases from Italy, UK, Netherlands, etc.
The Vibrant Hadoop tech community in France was present, with talks from Criteo and EDF.

It has been a very interesting event. Get in touch with me on Twitter if you'd like more details.