Dealing with Data: Defining the Components to Tune

I've been reading a fascinating article about the Large Hadron Collider, or LHC facility. It's a scientific research facility that houses a particle collider, which generates an incredible amount of data. Their original plan was to stream the data to tape, then sending the data to "islands" closer to the users, offloading the network as quickly as possible. But they found that the network could handle the streaming better than they thought - so they now stream the data directly to the users, saturating the network. It's a new way of thinking about moving the data around.

Another interesting data concept is that they filter it before they store it. We're not talking trivial reductions here - they are filtering a petabyte (PB) of data a second to a gigabyte per second! That's incredible. In fact, an overwhelming majority of the CPU power there doesn't go to computing numbers and so on in the scientific exercises - it's used to filter the data.