Life as a Physicist

The senate and the legislature in Washington have come to an agreement. The results will be posted at 9am tomorrow morning (Friday) – under the Conference heading. But rumors have begun to flow out. 22% cut in how much the state funds the university. To make it back they give us permission to raise tuition by 14% this year and next year. After you run the numbers you’ll find something like an 11% cut in the operating budget for the university. This is just an acceleration of a trend that has been going on for years: the state is slowly transferring university support from hits coffers to the backs of the students.

They are cutting support by almost a quarter. That will probably drop the state to the fourth largest source of funding for the university, research funding, tuition and the endowment (well, when it goes back to being positive… ahem), tuition [This ordering I’m repeating from the UW administration]. The irony is the total drop in our budget – about 11% – sounds good given what we thought might be coming at us – the 23%-30% cuts.

Should UW remain a state institution at that point? Perhaps it is time for more independence and a looser arrangement with the state of Washington. For example, they could pay us some $$/seat to subsidize the seats for instate students. And otherwise get out of our hair. That would be too bad – currently one of the big missions of UW is to serve the state. I suppose it will always do that to some extent seeing as it is located here, but as state funding decreases the UW will have to look out more for itself and less for state interests.

A pity. I guess we will see what the actual numbers are tomorrow. Anything close to the above is depressing, though we will be able to deal with it. My impression is things will move very quickly after that. By May 14th the university has to have a budget to present to the board of regents. And as part of that process various departments (like mine, physics) will find out how much they really have to cut.

UPDATE: Thanks to a friend that pointed out several inaccuracies in this post.

You know how it is. You decided to by that new car on sale because it was just before the new ones came out – sales were great because dealers were clearing lots… And then the new model comes out… and it flies!!! What are you to do? You are stuck with your old car for several years; you can’t just sell it; the value has dropped so much because these new cars fly!

It is no wonder, then, that 76 American Nobel laureates publicly supported Barack Obama – nor that there has been a dramatic lightening in mood among my colleagues in New York over the past week.

Obama understands – at least, according to his campaign literature and rhetoric – that science has the power to improve lives profoundly.

He also realises that the nations that succeed in the highly competitive world economy are those that foster technological advances and nurture intellectual strength.

…

So there is a feeling of hope that the new president will be much better for American science, and as a result for science across the globe.

I’ve seen other articles around the web describing the new “spring” in scientists step.

There is a problem, however, for many of us: that new about to be obsolete car is still on the lot. The funding model in particle physics is done mostly by yearly grants, reapplied for every three years (with reports and oversight on a much more frequent timescale). If you and your group happens to be up for renewal right now… well, the money that the DOE and NSF have to spend on you is what is in the current, pre-Obama budget. And that is a budget from a continuing resolution – and continued at a disastrously low level.

N.B. While I will always remain optimistic (with brief interludes of bitterness) that things will get better, I am under no illusion about the current financial situation and its already real and further potential impact on scientific research in the USA. As the article says:

But it is one thing to pledge an increase of funding during an election campaign, and another to double the budget during a global recession.

Slowly Paula and I have been moving back into our Condo. I arrived on Sept. 1, and since then every weekend a little bit more gets done. We finally have the furniture in the right place, and we are starting to unpack all the boxes that came back from Europe, sort everything into its cubby hole… This pile of AC adaptors was spread around – in back-packs, suit cases, boxes… I think every time I travel I forget an adaptor so I have to buy a new one…

A group of us are attempting to rewrite a Monte Carlo study that some internal referee’s had concerns with – as a result I’ve not had much time for anything but that, eating, sleeping, and changing diapers.

I usually don’t put stuff up here about things like Skype (unless in some other context) – but they just came out with a new version both for the Mac and Windows. Get it. The claim is they worked on call quality. Wow. Paula has a Mac, and I’m on Windows. We used to always get echo’s and Paula’s voice sounded pretty much like she was living in a tin can. The new version has totally killed the echo’s and she sounds like she might even be human.

Unfortunately, it would seem that both ends of the conversation have to upgrade to get the full benefits. At any rate, nice job Skype!

It’s my birthday today (22nd). Stop by and leave a comment! I’ve been doing this blogging for over 3.5 years now. Never imagined I’d still be writing!

Update: Awesome. More bad karma. In the last 15 minutes I’ve gotten 1050 new emails. All of them Russian, all of them bounces. How nice, and right for my birthday! Thaaaank you! 🙂 Now, the question is — how much real email am I going to accidentally throw out with this junk!? And can I use this as an excuse to not do some task (sorry, didn’t see that email!).

There are times when I worry that things I have taught in introductory physics – like electricity and magnetism – aren’t really used in particle physics (At UW these are called Physics 121, 122, and 123).

The biggest example is momentum conservation. We use this all the time. In fact, one of the primary ways we will discover a new beyond-the-standard-model particle is via momentum conservation. A common line of reasoning is that we’ve not been able to detect this particle up to now because it doesn’t interact with our matter and our detectors as we expect it to. This is where basic physics comes to the rescue. We know the initial momentum of the collision in our detector. If this new particle were to fly off into the distance and not interact with our detector, then when we summed up the momentum of all of the outgoing particles… well, there would be some missing momentum! Score! Of course, it isn’t quite that simple, things like neutrinos will mimic exactly that signal, but there are ways around it.

The second place basic physics often comes into play is in detector construction and operation. For example, ATLAS has two large and very powerful magnetic fields. The first is the inner tracking field, and the second is the outer toroid field. Magnetic fields interact – think of bringing together two North pole magnets. So these two fields were carefully designed not to interact.

Except, one has to pump current through the outer toroid field to the inner solenoid magnet. As anyone who has taken a basic E&M course will tell you, a current generates a magnetic field. This means the cables that carry the current have to be able to withstand the force of the magnetic field interaction! At these field strengths and the 1000’s of amps of current flowing – that is a lot.

Of course, the engineers knew about this, and designed the cable housing to withstand this. Trickier than it sounds since all of this is superconducting. Still, it was nice to hear the reported successful test of this.

In the picture the 8 large tubes that surround the ATLAS detector generate the toroid field – they are 8 really giant superconducting magnets.

Congratulations to everyone at D0 and the Fermilab accelerator division — D0 has been sent 4 fb-1 as of April 30!! That is a lot of data! And years of work!

It will be a while before you see that in an analysis however. First of all, of the 4 fb-1, only about 3.5 fb-1 were written to tape. The rest is lost forever. Where did it go? Well, perhaps our detector was broken for some short amount of time. We do our best to make sure that doesn’t happen of course, but a machine like this does break (my name is no more than a few of those minutes for Level 3/DAQ problems!) We generally run our triggers with a small fraction of dead-time — time where we aren’t accepting new events even if they are coming in – if we were to run with zero dead-time we’d not be able to do nearly the physics program we do [if you want more details, let me know].

The second issue is time. It takes time to understand our data – a while to do something as sophisticated as a Higgs search. Parts of the detector are turned off, which affect efficiencies and systematic errors. The events we trigger on are changed as we try to optimize our data for the higher instantaneous luminosity the Tevatron delivers. On top of that, of course, is the continuous effort to improve our analysis technique’s power. All these changes must be carefully studied to see how they impact each analysis. And once that is done, only then can the data be shown externally. This takes quite some time. The more sensitive the analysis, the more carefully the data must be studied.

So, fantastic for us at D0 (and CDF) for reaching 4 fb-1 delivered. Thanks to the Fermilab accelerator division for doing this despite the trying times. Everyone else: sorry, you’ll have to hold your horses a short while before we show the results of this data!

Amazon has done a lot of work to make GRID computing services accessible to anyone that wants it. Actually, it surprised me that Google or Microsoft didn’t do it first — to run their search engines and other similar things they must have farm computing down to a tee.

In HEP we spend a huge amount of money and cost and time with the GRID. A discussion in a bar some time back generated the question: what would it cost to move HEP into the cloud?

Databases

Yesterday I mentioned databases for storing event data. Amazon has SimpleDB (see this posting to get an idea of how it works). On the surface it looks rather poorly suited to do what we would want to do with our highly structured data. But, ignoring that and some of the overhead it will charge – for the 100 GB of data that Rich had in his database it would cost about 150 bucks a month to store it. Querying is dirt cheap — 14 cents per hour of CPU time used. I have no idea what the performance would be on a database like this, but even if it were x10 slower I doubt it would matter much.

ATLAS’ equivalent database to Rich’s project is thought to be 14 TB/year. That works out to be $21,500/month.

Event Data

Amazon has a simple storage service as well (Amazon S3). Because the data is just a binary blob the cost of storage is much cheaper: 15 cents per GB per month. However, trying to figure out what size ATLAS will actually use if it stored everything in the cloud, and ignored the actual design, is difficult. Making some rough estimates from an old version of the computing model, I’m going to guess about 10 PB per year (that is petabyte!). That is about 1.6 million bucks per month. But we aren’t done with this yet, however – it costs money to move the data in and out. First, just to load the data it will cost about 1 million.

Then we have to use the data – lets say each year we cycle through all the data once — so all 10 PB. That will run about 2.5 million per year (not per month!). But if we use Amazon’s EC2 compute cloud, moving data to it and back is free. In that case, only final datasets will probably be moved. That would be much cheaper.

Computing

This is even harder for me to calculate. This matches up with Amazon’s EC2. One cool thing is data between these computers and S3 is free. Otherwise, for a 32 bit single processor machine that has enough memory to run ATLAS software it looks like it costs about 10 cents per hour of use. Now, in ATLAS an estimate in 2005 was it would take about 3000 kSI2k to reconstruct the average event. So, for an Amazon machine (that is about 1.9 kSI2k) that would take about 26 minutes. So, about 5 cents per event to reconstruct the event. If we expect 2,000,000,000 events per year, then that will cost us $100 million dollars to reconstruct. If someone is familiar with SpecINT2000 and how it works, perhaps they can verify I did this math “ok”. And I’ve not included analysis time which is probably x2 more.

So, there you have it. A lot of money would go into running this in the cloud. Of course, we could never walk up to someone like Amazon and dump this on them. In almost all cases we will do better on our own as we can optimize what we are doing for our uses. Further, the cash that gets spent on this is from all over, and in all different colors. Many nations, for example, buy GRID installations for all scientists in their country. ATLAS just piggybacks on these purchases and uses a portion of them. Still, interesting to see what the cost would be – about 120 million before you even start to analyze the data to produce a physics result!

Not everyone is satisfied with ROOT as the “tool” to analyze HEP data. Back in D0’s Run I all the data was loaded into a commercial database.

So, before you roll your eyes – you are right. HEP is littered with database train wrecks (can anyone say Objectivity?). However, most of those had to do with trying to store every single last bit of data that came off the data acquisition system in the database. And then also store reconstructed data. And then, in some cases, even the analysis level objects. In fact, ROOT grew out disagreement with this vision (and you can tell who won…).

This project, however, was different. The goal was to store only the high level physics information. For a reconstructed jet, for example, they had the four vector and some other quantities (like electromagnetic fraction of calorimeter energies – 28 values in all). They had separate markers for tight very high quality electrons and loose, lower quality, electrons. Same for muons, jets, etc. To understand the limitations of this — and what you might or might not do with this tool: if you changed your jet energy scale you would have to completely re-load the database. This is not something you do frequently, but you get the idea: this is to do your final selection – the last mile of your analysis. Indeed, the test case was to repeat the Run 1 top discovery analysis. However, if you can do selection quickly imagine the power for scanning over a large SUSY parameter space!

How much data? About 62 million events. As a raw ntuple it was 62.4 GB of ntuples (small by today’s standards, of course!). It took almost 1000 hours to generate these ntuples – applying jet energy scale, etc. After being inserted into the database it was 80 GB of raw data, and another 30 GB of database index data.

They used Microsoft’s SQL Server for this. On a qual 450 MHz Pentium II with 256 MB of memory. Does that tell you how long ago this experiment was done!?

Actually, their DB design was pretty clever. All electrons in one table, all jets in another. Then another table which just listed all tight electrons, and another one that listed all loose electrons, etc.

So, how fast did this thing run? So, looking for a Z boson goes to two electrons took about 7 seconds. It found about 6000 events – the right number. Looking for a W boson decaying to an electron and neutrino took about 18 seconds to find 86,000 events. That is pretty darn good!

Are there plans to do this in ATLAS? Well, perhaps. We have a physics summary database – but it isn’t complete (e.g. doesn’t have all the jets in an event). It its design goal is different: you use it to select a sample of events you actually want to run over.

The project was lead by Rich Partridge at Brown University (with a lot of help from an undergraduate Matt Bowen). For more raw information you can see a talk by Rich at a SLAC meeting the other day (CERN ATLAS agendas, look for meetings on Feb 27, the SLAC ATLAS forum).

At any rate, this was something I’ve been meaning to write about for a while. Unfortunately for an approach like this, about 95% of an analyzer’s time is spent trying to understand what exactly is a tight electron – and its fake rate. However, anything that makes for fast turn around is a boon in my book!

I’m impressed. I’m now sitting in Terminal 4 of Heathrow waiting for my short flight to Paris. And I’ve zipped through this airport — no delays moving about. This is better than the stories I heard from some people arriving last week at Glasgow who’d gone through Heathrow: waits of up to 2 hours to change terminals. Sweet!