Schedule: Nerdcore sessions

A practical step-by-step description of how the LAMP based Top10 Alpha was turned into fully data-driven product. Based around a real-time data processing pipeline and asynchronous stack, Top10's infrastructure now hinges on AKKA, along with Scala, Nodejs and a host of other technologies. This has enabled interesting uses of the data and new, exciting user-facing features.
Read more.

Big data often doesn't sit well with companies that want to move fast. Technologies like Hadoop can be expensive to setup, slow to produce results, and time consuming to maintain. Streaming algorithms provide an alternative. They are simple to implement, very efficient, and give real-time results. In this talk I will describe several key streaming algorithms, and give examples of their use.
Read more.

Data Science projects are difficult to realise as they require both mathematical and IT abstractions at once. We need databases, linear algebra, message queues... all at once. Traditional environments like Java/C#/Matlab/Mathematica provide only one. I will talk about the new language, Clojure, provides all the platform power of the JVM, as well as the language and libraries to do data science.
Read more.

This presentation will give an overview of mapreduce-based algorithms described in recent papers written by academic and industrial researchers. Included areas: AI/Machine Learning, Bioinformatics, Information Retrieval. Focus will be on patterns of problems and the corresponding mapreduce solution patterns. Some background material:
http://mapreducepatterns.org
Read more.

As open data and linked data communities grow, so do the number and average size of freely available datasets. Often these datasets are modelled and interlinked using RDF.
This talk shares tips and tricks, use cases and practical examples of how to effectively use tools from the Hadoop ecosystem to process large RDF datasets.
Read more.