The personal view on the IT world of Johan Louwers, specially focusing on Oracle technology, Linux and UNIX technology, programming languages and all kinds of nice and cool things happening in the IT world.

Sunday, September 02, 2012

Developing big-data triggers

Most of you will know CERN from the LHC, Large Hadron Collider, experiment used to discover the Higgs Boson particle. This is one of the most interesting experiments within physics at this moment and the search for the Higgs Boson particle comes into the news quite often. What a lot of people however do not realize is that this research is somewhat different from traditional research in the field of physics as it comes to the amount of data.

When Isaac Newton “discovered” gravity it only took him a tree to lean against and a apple to fall down. Those are not a lot of input streams of information. When it comes to finding the Higgs Boson particle we are playing in a total different field when it comes to the number of inputs. During an event the data capture system will store every second a dataflow the size of rougly six times the Encyclopædia Britannica.

The main issue is that the systems will not be able to handle and store all the data presented to the sensors. All sensors will have triggers developed to capture the most important data. As we are talking about find a particle that is never discovered before the triggers might discard the Higgs Boson particle data instead of storing it for analysis. Developing the triggers is one of the crucial parts of the experiment and one of the most critical parts. In the below video Tulika Bose a assistant professor from the University of Boston gives a short introduction to this.

Within CERN The TriDAS Project is responsible for developing the data Acquisition and High-Level Trigger systems. Those systems will select the data and store it and finally result in data that can be analyzed. For this a large group of scientists and top people from a large number of IT companies have been working together to build this. IT companies like Oracle and Intel have been providing CERN with people and equipment mainly so they can test their new systems in one of the most demanding and data intensive setups currently operational.

Below you can see a high a high level architecture of the CMS DAQ system. This image comes from the "Technical Design Report, Volume 2" delivered by the TriDAS Project project team.

In a somewhat more detailed view the system looks like the architecture below from ALICE project. This shows you the connection to the databases and other parts of the systems.

While finding the Higgs Boson particle is for the common public possibly not that interesting on the short term having IT companies working together with CERN is even though it might not be that obvious at first. CERN is handling a enormous load of data. IT companies who participate in this project are building new hardware, software and algorithms that are specific to finding the Higgs Boson particle. However, the developed technology will be used within building solutions that will end up in serving customers.

As big-data is getting more and more attention and as we can see all kinds of big-data based solutions are developed we can see that this is no longer a pure scientific play field. It is getting into the day-to-day lives of people. This will help people in the very near future in their day-to-day lives. So, next time you question what the search for the Higgs Boson particle is bringing you as a individual on the short term, take the big-data part into your consideration and do not think it is only interesting to scientists(which is a incorrect statement already however a topic I will not cover on this blog :-) )