First, though, as far as I can tell it doesn’t even tell you what version of UAH this corresponds to (ah, but actually it 5.4. You can tell this by reading things like “The program txx_1_5.4”. Yay). Under “1.3 Document Maintenance” it does say: When requested by NOAA, if there have been any changes in procedures required for the production of the products or if the description of procedures has inadvertent omissions or errors, we will update this; and there certainly have been updated to UAH since 2011; but the doc is still the original.

I’m also a teensy bit unclear about what its describing: on the face of it, its missing rather a lot, because it says The deep layer temperature products described here come from measurements produced by Advanced Microwave Sounding Units (AMSU-As, hereafter “AMSU”) … Before AMSU, the Microwave Sounding Units (MSUs) flew on the NOAA polar orbiters since late 1978. Processing of the older MSU data, except in the homogenization routines, is not addressed by this document. WTF?

[Update: by bizarre co-incidence, S+C+B have just released, or announced, v6. As they say Many procedures have been modified or entirely reworked, and most of the software has been rewritten from scratch. There’s just a hint that they may have rushed this out: After three years of work, we have (hopefully) finished our Version 6.0, but who knows.

Ha. Actually, there’s rather more than a hint that this may be rushed: if you read to the end, they back off: This should be considered a “beta” release of Version 6.0, and we await users’ comments to see whether there are any obvious remaining problems in the dataset.

Eli, never one to stand on ceremony, steps into the torrent of ignorant praise over at Roy’s to ask where’s the code? But its not available “yet”.]

Anyway, onto the excuses (my bold):

The codes described here and provided to NOAA have not been optimized in a software engineering sense. Much of the programming structure originated over 20 years ago, starting around 1989, and was written by the authors who came from a generation of self-taught programmers and have little formal computer programming training. Much of the work was done with little funding support, so no professional programmers were utilized. In Christy’s code, there are numerous sections devoted to image creation through NCARgraphics for detection of problems, but which are not necessary for the production of the ASCII files desired by the users.

There is little use of subroutines in Spencer’s code, but more in Christy’s. Continuity of operational procedures has taken precedence over elegance or speed of execution.

As algorithm enhancements were tested, many were abandoned, but those portions of the code were simply commented out rather than deleted, i.e. they are vestigial in reality. While this is somewhat sloppy from a software design standpoint, the practical advantage of this is to provide a detailed reminder of what has been tried before.

In some cases, rather than having unused code commented out, there are sections which are never branched to in the operational running of the code because an initial adjustable parameter is always assigned a single value. A good example is diurnal adjustment of the AMSU data, for which much code is included, but has never been used operationally. In other cases, a particular ancillary analysis was needed for a publication, but not needed for production runs. These sections are usually commented out.

Most of the programs have array dimensioning and assignments which must be manually updated every month and year, since (at this writing) they only handle data through July 2011. Similarly, if a new satellite is added, then there are program changes which must be made to accommodate those new datasets.

The programs were originally developed on an SGI workstation or an IBM mainframe,and then later transitioned to Linux. As a result, all previous binary input and output files had a byte-ordering issue. We retained the SGI handling of binary files, so some of the programs must be run with a byte-swap option used on execute. This might not be an issue if NOAA re-generates all output files from scratch, but if our previous outputfiles are used, there will be a problem.

Also, we have had problems processing of a month’s worth of global AMSU data causing some sort of memory size allocation exceedance during a single program execution, which leads to only a portion of the data being processed properly. This is also handled with a special option during execute.

Well, that looks like a perfect way of making sure that no-one at all ever reads your code. But it also looks like a way of ending up with a hideous heap of gunk that even you can’t update.

As a self-taught computer programmer from roughly the same era, I can appreciate their honesty. I also cringe with embarrassment when I think of some of my early programs. That said, there were reasons elegant, well-structured, well-commented code was rarely the norm. Chief among them were:
1) I didn’t know any better,
2) I didn’t have the time,
3) If it ain’t broke, don’t fix it

Now, most of my programs were simply automation of manual tasks – tasks that I and my co-workers performed sometimes 3 or 4 times a day and sometimes only 3 or 4 times a year. It didn’t make any sense to spend more time on a program than the time that was going to be saved by using it – especially since this was never part of my job title, job description, or any major portion of my employment review. In fact, many of the early programs were written on my own time (and really for my own benefit) precisely because my employer wouldn’t pay anyone to do it.

Eventually, programming did become part of my job – despite never having any formal education in computer programming. The gains in productivity and efficiency didn’t go unnoticed, but only rarely have I revisited my early programs and made them more presentable. It’s difficult to justify rewriting code that works — even though it may be ugly, finicky, and/or unreadable — when there are a hundred other programs that still need to be written.

Every now and then I’m able to incorporate an old program into a new one – usually the old code is completely ignored in writing the new program. Even though the task itself may not have changed one iota, coding standards and capabilities have changed so dramatically that little or nothing is gained from even attempting to read and understand the old code.

So I can actually sympathize with Christy and Spencer and the deplorable state of their code. Every defect they mention I’ve done at one time myself – even when I knew better and knew I’d kick myself later for writing it quick and dirty.

[I, too, have a good deal of sympathy for the predicament they seem to be in. I’ve written lots of crappy Fortran code; indeed, its easy to write crappy code in any language. However… however, actually, I probably shouldn’t push this too hard, as I haven’t read their actual code yet, or RSS’s -W]

This may be how software growths, especially when there is no funding to do it decently. Getting funding for a short temperature series, which has many problems with non-climatic changes and needs major highly uncertain adjustments, is probably hard. Funding is allocated based on scientific merit, not for importance in a political “debate”.

Even if they find the same trends as the surface temperatures, this does not sound like code I would base policy on.

“Even if they find the same trends as the surface temperatures, this does not sound like code I would base policy on.”

You should take a look at some of the GISS fortran code then.

[Happily, as Eli points out below, GISStemp has already been rewritten in clean Python. They even found a couple of trivial bugs in the process. More important, I think, is that we already have 3 or 4 independent surface records, that essentially agree. If one had been an outlier, people would have been crawling over its code -W]

But it’s not the mainstream science side arguing that programs written by scientists who are amateurs at writing code makes those programs worthless.

It is your side.

My guess is that the UAH code, as sloppy as it is, and as awkward as it is to update for new years, new satellites, and the need to kludge around a memory allocation issue actually probably works surprisingly well.

Just like GISTemp, and a bunch of other scientific code which might give us software professionals cause to chuckle.

Now, the question is, if you (or those on your side of the fence), have been arguing for years that GISS code can’t be trusted because of “unprofessionalism” will you

1. throw out UAH
2. make excuses for UAH and claim the case is “different” (perhaps because Spencer and Christy both can claim “God is my co-programmer”)?

How much time and money would it take to write a clean modern code for UAH as is being done for GISS? How does a cleaner code improve the accuracy of UAH and GISS output?

[As d says, below, rewriting GISS in Python threw up a couple of trivial bugs, but made no significant difference. But it does increase the confidence that the GISS code is correct. And as an added bonus it as free, as CCC did it in their spare time. My feeling is that reworking UAH would be much harder -W]

[P]ortions of the code were simply commented out rather than deleted…While this is somewhat sloppy from a software design standpoint, the practical advantage of this is to provide a detailed reminder of what has been tried before.

Have these people not heard of version control, archiving, or documenting properly?

[R]ather than having unused code commented out, there are sections which are never branched to

Unreachable code is a very bad coding practice, and something we’re taught to catch.

Well, that looks like a perfect way of making sure that no-one at all ever reads your code. But it also looks like a way of ending up with a hideous heap of gunk that even you can’t update.

You nailed it Dr. Connolley. Code that’s not maintainable or update-able is very poor code.

[My strong suspicion would be that they have, indeed “never heard of version control”, in the sense that they’ve heard vague rumours but its scary stuff that they’re not going to touch -W]

This is one of the things that has me boggled. Subroutines and functions (I consider C to be my native programming language) make your life a great deal easier, because you only have to update that particular module, not the entire code, and the danger of undetected typos is correspondingly reduced. I can understand most of the other poor choices: version control would have been mostly unknown in non-CS academic compartments when they started the project (not that that excuses their failure to adopt it subsequently); commenting out code is something that may have been done as a quick-and-dirty fix which was left in; they are primarily paid to do things other than maintain this code, etc. Some of what they did was just poor design decisions made by people who didn’t know better (and I can’t say I would have avoided those mistakes myself). But you don’t want to make it harder on yourself than you have to.

The other thing I have little sympathy for, as Julian mentioned above, is the sections of unreachable code. GOTO has been considered harmful since about the time I was born, and this is one of the reasons why. It’s one thing to have sections of code labeled, “If we get here, something has gone horribly wrong.” Or you put a bunch of code in a subroutine that never gets called. I have put in debugging statements that are included via #ifdef statements at compile time, and left them in the code (but not compiled) for the production version. But to have a block of code in your main routine that you intentionally skip over with GOTO statements is one of the worst forms of spaghetti coding out there.

The overhead structure at many Federal labs is such that it costs nearly as much to hire a technician or programmer as it does to hire a Ph.D. scientist. There’s probably a lot of code out there that has never been seen by a trained programmer.

And I’m not convinced we’re much the worse for that. My experience is that professional programmers can turn inelegant but understandable (to a scientist) code into an over-modularized, pointer-infested black box. That happened with one of the models that I work with.

The machine doesn’t care what it looks like only that it executes. They don’t get confused as easily as we do.

Badly-organised code is tough for the humans trying to read it and work with it. I’d personally never go near a project which wasn’t fully tested (in the agile sense) because it would drive me insane wondering what the code was supposed to do or even if it did what it was supposed to do. There’s no way to know without a suite of acceptance/unit tests.

Still, these are issues at the human end. Even if it the code hasn’t been designed to be user-friendly it could still function perfectly well.

Hmmm, OK, I’m thinking those files represent one month’s data, since UAH updates monthly, and they manually edit the source code each month.

Which the commentary in the OP actually says, duh: “Most of the programs have array dimensioning and assignments which must be manually updated every month and year”

[I haven’t got my head round the processing chain yet. I think there’s a lot of intermediate files. Probably, they just have to run this anew for each new month. Which is fine, until you need to re-process all the months… -W]

Maybe I’ll become curious enough to look. Some preprocessing of the data they get from the satellite folks before using it to build the temp reconstruction seems likely, though.

“Probably, they just have to run this anew for each new month”

That’s what it looks like, and the code is providing the history of each run. The appearance of the word “monthly” in the output files would seem like a reasonable clue 🙂 The commentary mentioned processing of data through July of 2011, while the file being opened (not commented out) contains “1106” which would seem to be June 2011, so perhaps they meant “through June”. One might think that the file *1105* is may and *1104* is april etc.

“Which is fine, until you need to re-process all the months”

Given the number of times errors in their algorithm have been found, you’d think they’d have gotten tired of having to edit, compile and run the program for each historical month everytime they’ve fixed something! Guess they have too much time on their hands …

And 6.0 seems to be updated enough for climate risk deniers to embrace it again, as it is now in line with interim favourite RSS: No warming for 220 months!!! Hurrah, long live the endless possibilities of sloppy code!

“What have been the warming effects, if any, of anthropogenic gases? The typical answer is 0.5’C.

But the answer depends on what time interval is chosen. There was substantial increase in temperature from 1880 to 1940. However, from 1940 until the 1960s, temperatures dropped so much as to lead to predictions of a coming ice age. New, precise satellite data raise further questions about warming. From 1979 to 1988 large temperature variability was recorded, but no obvious temperature trend was noted during the 10-year period.”

RS says
“Finally, much of the previous software has been a hodgepodge of code snippets written by different scientists, run in stepwise fashion during every monthly update, some of it over 25 years old, and we wanted a single programmer to write a unified, streamlined code (approx. 9,000 lines of FORTRAN) that could be run in one execution if possible.”

One shouldn’t attribute anything to “sloppy code” w/o having seen the code. According to Spencer in his blog, UAH and RSS are in much better agreement now. I am going to assume, until proven otherwise, that, with this UAH revision, RSS has been vindicated from the suspicion of having a cold bias and being an outlier.

The claims by fake skeptics about global warming supposedly stopped x years ago are based on cherry picking and ignorance of statistics. It is unfortunate, if they can abuse two satellite data sets for their unscientific claims now, but this can’t be any basis to assume that the data were wrong.

If it comes up somewhere, one should point out why those conclusion drawn about the “pause” for x years lack scientific basis, but not attack the data. (However, one can mention that the UAH version 6.0 isn’t backed up with any peer reviewed publication yet, despite major changes in the methodology seem to have been done.)

In email, Steve Easterbrook argued that the elephant in the room is a lack of funding and career path for people who are professional software engineers in science.

True indeed.

Leaving aside questions of my own career, some of the tenure/promotion decisions I’ve heard of for the most productive scientist/coders have been inexcuseable.

I agree with dhogaza, that this doesn’t necessarily impact the validity of the results, but to say it doesn’t affect their credibility seems an overstatement.

It also affects recruiting and retention. Working in this sludgy software environment appeals neither to the engineer nor the scientist part of the potential contributor’s ambitions.

Engineering has moved on, and despite its importance much scientific software is a computational methods backwater and in some corners even a backwater in ordinary software competence. Climate science is a poster child for this problem.

I’m not surprised to find UAH code is a hideous mess. The community climate models at least have a lot of eyeballs on them, albeit focused on the model fidelity more than its usability.

But it’s the nature of software – one-offs from labs are likely to be broken, as defensive coding practices are absent from the social milieu. The pressure is to publish something credible, nbot to publish something correct.

The smaller the user base the more likely the code is to be broken. Getting code right is expensive and there’s little institutional motivation for it. Investigators are motivated to minimize/trivialize problems. The fewer the eyes on the code, the more likely errors are to persist. (Corrolary to Linus’s Law of Eyeballs: with few enough eyeballs, all bugs are deep.)

That Spencer knows what he is looking for only makes matters worse, of course, but in this he isn’t as much an outlier as we might want to believe. Everyone wants their own intuitions confirmed. Much of scientific method is to avoid fooling oneself. But we haven’t yet applied that in a systematic and serious way to complicated computations.

[I saw your post, and sympathise. It mirrors some of my own frustrations running the “ported” unified model on workstations and linux clusters. The models and more particularly the scripts, and the build scripts, that surround them, are very environment-sensitive. Or rather, 99% of the script isn’t, but its hard to track down and fix the 1%. I would regularly puzzle over the exact compiler options; or the exact version of xargs, that was needed. The irritation (in retrospect) is that much of this was voodoo – it was hard to distinguish between things that just needed a little tweak to make it work, and things that had to be exactly correct to have a chance of working.

Like you (I think) I found the porting to a new system quite fun the first time, but much less fun in subsequent times.

I read years back one of the Contributors at RC mentioning that the petroleum industry employs competent climatologists who do paleo modeling — to figure out where sediment accumulated that became petroleum, and where to look for it now — but rarely publishes.

[I mis-wrote. Its not industry – I bet the petroleum folk have crappy Fortran too – its being in the *software* industry. People just think differently about their code (or at least, a reasonable fraction do, and they’re the people that matter) -W]

“Like you (I think) I found the porting to a new system quite fun the first time.”

I never enjoyed the voodoo aspect of it, and never could find a decent explanation of Fortran compiler flags, supercomputer queuing scripts, the plethora of MPI libraries, etc. anywhere sufficient to take it out of the voodoo frame for me.

As I said in my plaint, it satisfies neither the programmer impulse nor the scientist impulse to keep making guesses until something seems to work. And as far as I could tell that was what I was expected to do in my recent position.

[It took me a while to realise that it was mostly necessary to find someone who understood the various compiler flags, because some mattered and some didn’t; that was frustrating and annoying. and in the case of the UKMO code, it would fail to compile at some optimisation levels and fail to run correctly at others, but this wasn’t too important as the higher optimisation didn’t gain you much.

I think I could do it all much better now; I failed to realise at the time what the rules were -W]