Life as a Physicist

It is hard for me not to feel very depressed about the way government funding is going in Washington. Especially all the “cuts” that keep being mentioned. So I thought I’d spend an hour doing my best to understand what cuts are being talked about. Ha! Sheer fantasy!

Before I write more, I should point out that I very much have a dog in this race. Actually, perhaps a bit more than one dog. Funding for almost all my research activities comes via the National Science Foundation (NSF) – this is funded directly by congress. My ability to hire post-docs and graduate students, train them, do the physics – everything, is dependent on that stream of money. Also, two months of salary a year come from that stream. In short, almost everything except for the bulk of my pay. That comes from two sources: state of Washington and student’s tuition. A further chunk of money comes from the Department of Energy’s (DOE) Office of Science – they fund the national labs where I do my research, for example. In short, particle physics does not exist without government funding.

So when people start talking about large, across-the-board cuts in funding levels I get quite nervous. Many republicans in 2010 campaigned on cutting back the budget, hard:

“We’re broke, and decisive action is needed to help our economy get back to creating jobs and end the spending binge in Washington that threatens our children’s future,” Mr. Boehner said.

Up until recently they really haven’t said how they were going to do it – a typical political ploy. But now things are starting to show up: cut funding to 2008 levels, and then no increases to counter inflation. The latter amounts to a 2-3% cut per year. No so bad for one year but when you hit 3-4 it starts to add up. You’ll have to let go a student or perhaps down-size a post-doc to a student.

But what about all these other cuts? So… I’m a scientist and I want to know: Where’s the data!? Well, as any of you who aren’t expert in the ways of Washington… boy is it hard to figure out what they really want to do. I suppose this is to their advantage. I did find out some numbers. For example, here is the NSF’s budget page. 2008 funding level was $6.065 billion. In 2010 it was funded at a rate of $6.9 billion. So dropping from 2010 back to 2008 would be a 12% cut. So, if that was cut blindly (which it can’t – there are big projects and small ones and some might be cut or protected), that would translate into the loss of about one post-doc, perhaps a bit more. In a group our size we would definitely notice that!

But is that data right? While I was searching the web I stumbled on this page, from the Heritage foundation, which seems to claim reducing the NSF to 2008 levels will save $1.7 billion, about x2 more than it looks like above. Who is right? I know I tend to believe the NSF’s web page is more reliable. But, seriously, is it even possible for a citizen who doesn’t want to spend days or weeks to gather enough real data to make an independently informed decision?

Check out this recent article from the NYTimes about a recent proposal coming from Congressman Jordan whose goal is to reduce federal spending by $2.5 trillion through fiscal year 2021 (am I the only one that finds the wording of that title misleading?). As a science/data guy the first thing I want to know is: where is he getting all that savings from? There are lists of programs that are eliminated, frozen, or otherwise reduced – but that document contains no numbers at all. And I can’t find any supporting documentation that he and his staff must have in order of have made that $2.5 trillion claim. So, in that document, which is 80 pages long, I’m left scanning for the words “national science foundation”, “science”, “energy”, etc. Really, there is very little mentioned. But I have a very hard time believing that those programs are untouched – as the article in the new york times points out, since things like Medicare, Social Security, etc., are left untouched (the lions share of the budget – especially in out years), and so all the cuts must come from other programs:

As a result, its effect on the entire array of government programs, among them education, domestic security, transportation, law enforcement and medical research, would be nothing short of drastic.

I agree with that statement. 2.25 trillion is a lot of cash! Can you find the drastic lines in that document? Well, perhaps you know more about Washington. I can’t. This gets to me because now if I have to get into an argument it is a very abstract one.

Pipedream: What I would love these folks to do is release a giant spreadsheet of the US gov’t spending that had 2008, 2009, 2010 levels, and then their proposed cuts, with an extra column for extra text. That is a lot of data, and would probably be hard to compile. But, boy, it would be nice!

The New York Times had an article the other day talking about a discovery that is making rounds:

Taking a test is not just a passive mechanism for assessing how much people know, according to new research. It actually helps people learn, and it works better than a number of other studying techniques.

I’m here to tell you: duh!

In fact, we’ve institutionalized this in our physics graduate schools. Most university physics departments have the mother-of-all tests. Here at UW we call it the Qualifying Exam. Others call it a prelim (short for preliminary). And there is a joke associated with this exam, usually said with some bitterness if you’ve not passed it yet, or some wistfulness if you long since have passed it:

You know more physics the day you take the qual than you ever do at any other time in your life.

The exam usually happens at the end of your first year in graduate school. The first year classes are hell. Up to that point in my life it was the hardest I’d ever worked at school. Then the summer hits, and you get a small rest. But it is impossible to rest staring down the barrel of that exam, often given at the end of the summer just before the second year of classes start. You have to pass this exam in order to go on to get your Ph.D. And for most of us, it is the last (formal) exam in our career that actually matters. So physiologically, it is a big hurdle as well.

How hard is it? My standard advice to students is that they should spend about one month studying, 8 hours a day. For most people, if they study effectively, that is enough to get by. Some need less and some need more. This is about what it took me. What is the test like? At UW ours is 2 hours per topic, closed book, and all it is is working out problems. No multiple choice here! It lasts two days.

So, how do you study? There is, I think, really only one way to get past this. For 30 days, 8 hours a day, work out problems. There are lots of old qualifier problems on websites. Our department provides students with copies of all the old exams. Even if you don’t know the solution, you force your self to try to work it out with out looking it up in a book – break your brain on it. Once you can solve those problems with out having to look at a text book, you know you are ready. Imagine trying to study by reading a text book, or by reviewing your first year homework problems. There is no way your brain will be able to work out a new problem after that unless you are a very unique individual.

Note how similar this is to the results shown in the article:

In the first experiment, the students were divided into four groups. One did nothing more than read the text for five minutes. Another studied the passage in four consecutive five-minute sessions.

A third group engaged in “concept mapping,” in which, with the passage in front of them, they arranged information from the passage into a kind of diagram, writing details and ideas in hand-drawn bubbles and linking the bubbles in an organized way.

The final group took a “retrieval practice” test. Without the passage in front of them, they wrote what they remembered in a free-form essay for 10 minutes. Then they reread the passage and took another retrieval practice test.

The last group did the best, as you might imagine from the theme of this post!

This is also how you know more physics than at any other time in your life. At no other time do you spend 30 days working out problems across such a broad spectrum of physics topics. If you study and try to work out a sufficiently broad spectrum of problems you can breeze through the exam (literally, I remember watching one guy taking it with me just nail the exam in about half the time of the rest of us).

Working out problems – without any aids – is active learning. I suppose you could follow the article and say that forcing the brain to come up with the solution means it organizes the information in a better way… Actually, I have no idea what the brain does. But, so far this seems to be the best way to teach yourself. You are actively playing with the new concepts and topics. This is why homework is absolutely key to a good education. And this is why tests are good – if you study correctly. If you actively study for the test (vs. just reading the material) then you will learn the material better.

And we need to work better at designing tests that force students to study actively. For example, I feel we are slipping backwards sometimes. With the large budget cuts that universities are suffering one byproduct is the amount of money we have to hire TA’s to help grade our large undergraduate classes is dropping. That means we can’t ask as many open-ended exam questions – and have to increase the fraction of multiple choice. It is much harder to design a test that goes after problem solving in physics using multiple choice. This is too bad.

So, is this qualifier test hazing process? Or is there a reason to do it? Actually, that is a point of controversy. Maybe there is a way to force the studying component without the high-anxiety of the make-or-break exam. Certainly some (very good) institutions have eliminated the qual. Now, if we could figure out how to do that and still get the learning results we want…

Google has 20% time. I have Christmas break. If you work at Google you are supposed to have 20% of your time to work on your own little side project rather than the work you are nominally supposed to be doing. Lots of little projects are started this way (I think GMail, for example, started this way).

Each Christmas break I tend to hack on some project that interests me – but is often not directly related to something that I’m working on. Usually by the end of the break the project is useful enough that I can start to get something out of it. I then steadily improve it over the next months as I figure out what I really wanted. Sometimes they never get used again after that initial hacking time (you know: fail often, and fail early). My deeptalk project came out of this, as did my ROOT.NET libraries. I’m not sure others have gotten a lot of use out of these projects, but I certainly have. The one I tackled this year has turned out to be a total disaster. Interesting, but still a disaster. This plot post is about the project I started a year ago. This was a fun one. Check this out:

Each of those little rectangles represents a plot released last year by DZERO, CDF, ATLAS, or CMS (the Tevatron and LHC general purpose collider experiments) as a preliminary result. That huge spike is July – 3600 plots (click to enlarge the image) - is everyone preparing for the ICHEP conference. In all the 4 experiments put out about 6000 preliminary plots last year.

I don’t know about you – but there is no way I can keep up with what the four experiments are doing – let alone the two I’m a member of! That is an awful lot of web pages to check – especially since the experiments, though modern, aren’t modern enough to be using something like an Atom/RSS feed! So my hack project was to write a massive web scraper and a Silverlight front-end to display it. The front-end is based on the Pivot project originally from MSR, which means you can really dig into the data.

For example, I can explode December by clicking on “December”:

and that brings up the two halves of December. Clicking in the same way on the second half of December I can see:

From that it looks like 4 notes were released – so we can organize things by notes that were released:

Note the two funny icons – those allow you to switch between a grid layout of the plots and a histogram layout. And after selecting that we see that it was actually 6 notes:

That left note is title “Z+Jets Inclusive Cross Section” – something I want to see more of, so I can select that to see all the plots at once for that note:

And say I want to look at one plot – I just click on it (or use my mouse scroll wheel) and I see:

I can actually zoom way into the plot if I wish using my mouse scroll wheel (or typical touch-screen gestures, or on the Mac the typical zoom gesture). Note the info-bar that shows up on the right hand side. That includes information about the plot (a caption, for example) as well as a link to the web page where it was pulled from. You can click on that link (see caveat below!) and bring up the web page. Even a link to a PDF note is there if the web scrapper could discover one.

Along the left hand side you’ll see a vertical bar (which I’ve rotated for display purposes here):

You can click on any of the years to get the plots from that year. Recent will give you the last 4 months of plots. Be default, this is where the viewer starts up – seems like a nice compromise between speed and breadth when you want to quickly check what has recently happened. The “FS” button (yeah, I’m not a user-interface guy) is short for “Full Screen”. I definitely recommend viewing this on a large monitor! “BK” and “FW” are like the back and forward buttons on your browser and enable you to undo a selection. The info bar on the left allows you do do some of this if you want too.

Currently works only on Windows and a Mac. Linux will happen when Moonlight supports v4.0 of Silverlight. For Windows and the Mac you will have to have the Silverlight plug-in installed (if you are on Windows you almost certainly already have it).

This thing needs a good network connection and a good CPU/GPU. There is some heavy graphics lifting that goes on (wait till you see the graphics animations – very cool). I can run it on my netbook, but it isn’t that great. And loading when my DSL line is not doing well can take upwards of a minute (when loading from a decent connection it takes about 10 seconds for the first load).

You can’t open a link to a physics note or webpage unless you install this so it is running locally. This is a security feature (cross site scripting). The install is lightweight – just right click and select install (control-click on the Mac, if I remember correctly). And I’ve signed it with a certificate, so it won’t get messed up behind your back.

The data is only as good as its source. Free-form web pages are a mess. I’ve done my best without investing an inordinate amount of time on the project. Keep that in mind when you find some data that makes no sense. Heck, this is open source, so feel free to contribute! Updating happens about once a day. If an experiment removes a plot from their web pages, then it will disappear from here as well at the next update.

Only public web pages are scanned!!

The biggest hole is the lack of published papers/plots. This is intentional because I would like to get them from arxiv. But the problem is that my scrapper isn’t intelligent enough when it hits a website – it grabs everything it needs all at once (don’t worry, the second time through it asks only for headers to see if anything has changed). As a result it is bound to set off arxiv’s robot sensor. And the thought of parsing TeX files for captions is just… not appealing. But this is the most obvious big hole that I would like to fix some point soon.

This depends on public web pages. That means if an experiment changes its web pages or where they are located, all the plots will disappear from the display! I do my best to fix this as soon as I notice it. Fortunately, these are public facing web pages so this doesn’t happen very often!

Ok, now for some fun. Who has the most broken links on their public pages? CDF by a long shot. Who has the pages that are most machine readable? CMS and DZERO. But while they are that, the images have no captions (which makes searching the image database for text words less useful than it should be). ATLAS is a happy medium – their preliminary results are in a nice automatically produced grid that includes captions.

I couldn’t leave this alone. I mentioned the ultimate logbook in my last posting. This is the logbook that would record everything you did and archive it.

It isn’t difficult. The web already has a perfect data format for this – Atom (or RSS). Just imagine. Each source code repository you commit to would publish a feed of all of your changes (with a time stamp, of course!) in the Atom format. Heck, your computer could keep track of what files you edited and publish a list of those too (many cloud storage services already do do this). Make a plot in ROOT? Sure! A feed could be published. Ran a batch job? The command you used for submission could be polished.

Then you need something central that is polling those RSS feeds with some frequency, gathering the data, and archiving it. Oh, and perhaps even making it available for easy use.

Actually, there is a service that does this already. Facebook. Sure! Just tell it about every RSS feed and it will suck that data in. Some of you are probably reading this on Facebook – and this posting got there because I told Facebook about this blog’s Atom feed and it sucked the data in.

Of course, having a write-only repository of everything you did is a little less than useful. You need a powerful search engine to bring the data you are interested in back out. Especially because a lot of that data is just a random command which contains no obvious indication of what you were working on (i.e. no meta-data).

And finally, at least for me, I don’t really want something that is static. Rarely is there a project that I’m finished with and I can neatly wrap it up and move on. Heck, there are projects I put down and pick up again many months later. This ultimate logbook doesn’t really support that.

Perhaps it is best to split the functions. Call this a ultimate logbook a daily log instead, and then keep separate bits of paper where you do your thinking… Awww heck, right back to where we started!

BTW, if you think Facebook might be interesting as a solution here, remember several things. First, as far as I can tell, there is no way to search your comments or posts. Second, you might get ‘Zuckenberged’ – that is, the privacy settings might get changed and your logbook might become totally public.

No one mentioned using a kindle/nook to read their logbook, btw. For software that gets used most like a logbook it looks to me like Evernote wins.

For me the most surprising method was email. And by surprising, I mean smacking myself on the forehead because I’d not already thought of it. Here is the idea: just email your log book entries – with files and attachments, etc., to your logbook email account. Then use the power of search to recover whatever you want. And since you can stick it on Gmail or Hotmail or Yahoo mail, you have almost no size restrictions – and it is available wherever you happen to have a internet connection. Further, since it is just email, it is trivial to write scripts to capture data and ship it off to the logbook.

Now, I’ll ramble a bit in way of conclusion…

Do you remember MIcrosoft’s failed phone, the Kin? It was basically a smart phone w/out the apps. But one of the cool things it did was called Kin Studio. The point was this – everything you did on the phone was uploaded to the cloud. All the text messages you sent or received, all the pictures you took, etc. Then on the web you could look back at any time at what you did and have a complete record. Now, that is a logbook.

Of course, there are some problems with this. Who wants to look at lots of messages that say “ok!” or “ttl” or similar? And the same problem would occur if we were able to develop the equivalent of the Kin studio for logbooks. It would be a disaster. Which I think gets to the crux of what many of you were wrestling with in the comments of those posts (and something I wrestle with all the time): what do you put in a logbook!? There is a part of me that would like to capture everything – the ultimate logbook. Given todays software and technology this wouldn’t be very hard to write!

In thinking about this I came up with a few observations of my own behavior over the last few years:

One way to look at this is: what do you look up in a logbook? I have to say – what I look up in my logbook has undergone some dramatic changes since I was a graduate student. Back then we didn’t have the web (really) or search engines. As a result writing down exactly what I needed to do to get some bit of code working was very important. Now it is almost certain I can find a code sample on the web in one or two searches. So that doesn’t need to go into the logbook anymore. Plots still go in – but 90% of them are wrong. You know – you make the plot, think you are done, move on to the next step and in the process discover a mistake – so you go back and have to remake everything. And put the updated version of the plot into your logbook. Soon it becomes a waste of time – so you just auto-generate a directory with all the plots. So it always has the latest-and-greatest version. Hopefully you remember to put some of those into your logbook when you are done… but often not (at least me).

What is the oldest logbook entry you’ve ever gone back to? For me it was the top discovery – but that was nostalgia, not because I needed some bit of data. I rarely go back more than a few months. And, frankly, in this day and age, if you do an analysis that is published in January, by July someone (perhaps you) have redone it with more data and a better technique in July. You need those January numbers to compare – but you get them from an analysis note, not from your logbook! In short, the analysis note has become the “official” logbook of the experiment.

I have to say that my logbook current serves two functions: meeting notes and thinking. Meeting minutes are often not recorded – so keep a record. Especially since I’m using an electronic notebook I can mark things with an “action” flag and go back later to find out exactly what I need to do as a result of that meeting. The second heaviest use for me is brainstorming. Normally one might scribble ideas on some loose paper, perhaps leave them around for a day or two, come back refine them, etc. I use my logbook for that rather than loose paper.

Now a days I definitely do not keep a log book in the traditional way. Certainly not in the way I was taught to use a logbook in my undergraduate physics classes! Here is a quote from an ex-student of mine (in the comments of one of the previous posts – and I can copy this because he already has a job!!):

I have a rather haphazard attitude toward these things–I have a logbook, but I use it to remember things and occasionally to sort out and prioritize my thoughts. So it’s fairly sparse, and it certainly would be of no help in a patent dispute! Often I keep my old working areas around on my computer, and I use them if I forget what I did in my previous work.

This is pretty typical of what I see in people around me in the field. Other commenters made reference to more careful use of logbooks. I wonder how much usage style varies by field (medicine, physics (particle vs. condensed matter, theory vs. experiment), engineering, industry vs. academic, etc.)?

I like the way one of my students, Andy Haas, put it once. He was giving a talk at a DZERO workshop on the Level 3 computer farm and trying to make a point about the number and type of computers that were in the farm. He drew an analogy to the number of laptops that were open in the room. It can be a little spooky – almost everyone has one, and almost everyone has them open during conference talks. In Andy’s case there were about 100 people in the room. And when you are giving the talk you have to wonder: how many people are listening!?

There is another side-effect, however. It is rare that the hotel, or whatever, is ready for the large number of devices that we particle physicists bring to a meeting. In the old days it was a laptop per person and now add in a cell phone that also wants a internet connection. Apparently most conference organizers used to use to guess that it would be about 1 in 5 people would have a portable that needed a connection at any one time. Folks from particle physics, however, just blew that curve! The result was often lost wifi connections, many seconds to load a page, and an inability to download the conference agenda! As conference organizer we have long ago learned that is one of the most important things to get right – and one of the key things that will be used to judge the organization of your conference.

The article is interesting in another aspect as well (other that pointing out a problem we’ve been dealing with for more than 10 years now). WiFi is not really designed for this sort of use. Which leads to the question – what is next?