5 Million Lines of Obfuscation

The healthcare.gov mess is inspiring some messy reporting.

Last weekend, some anonymous “specialist” told the New York Times that “5 million lines of software code may need to be rewritten” in order to fix the mess that is healthcare.gov. (The good news, according to the source, is that the project has a total of “500 million lines of software code,” so only 1 percent has to be rewritten. So the code’s 99 percent good—or something.)

I don’t mean to jump back on my hobbyhorse of complaining about lack of knowledge in tech journalism, but printing a claim like that is egregious.

Why? Well, here’s a line of C++ code:

}

The close curly brace signals the end of a block of code. It could be put on the same line as the previous, more substantive line, but for the sake of cleanliness, programmers tend to put it on a line of its own. When it comes to coding in HTML, Perl, and AJAX, different programmers have different styles. Some will split code up into many lines; others will compress it into a handful of lines. I’ve seen nearly identical segments of code written in 10 lines or in 50.

Here’s another line of C++ code.

// TODO: make sure this code doesn’t crash!

That’s a comment. It doesn’t do anything—those two slashes at the beginning tell the compiler (which converts code into actual computer instruction) to ignore the line. It’s there to explain things to people reading the source code, or in this case to remind the programmer to fix whatever lies immediately below. I’ve written cryptic bits of code that required more lines of comments than lines of actual code, simply to explain what on earth was going on.

That code prints out all the prime numbers from 1 to R. APL is a notoriously terse and nightmarish language. I have successfully avoided ever coding in it. One single line of APL code could contain half a dozen bugs.

So not all lines of code are created equal. As a programmer, I had weeks where I produced 1,000 lines of code. I had weeks where I produced 20. Usually the latter weeks were more grueling, because any 20 lines requiring that much time and effort are going to be a) important, b) complicated, and c) bug-prone. The 1,000 lines were far more likely to be simple stuff that I could code by rote. I even had weeks where I removed 2,000 lines of code by removing redundancies between similar blocks of code. Those were the best weeks of all, because less code means fewer bugs.

Programmers who do user interface code—which is responsible for the visuals and input components of software—tend to produce far more code than other programmers, because user interface code requires a lot of boilerplate. I knew programmers who wrote 10,000 (good) lines of user interface code in a week. Many of them were copied and slightly modified from other projects or example code.

Consequently, it’s rather silly to say, as the Times article does, that “a large bank’s computer system is typically” 100 million lines of code. Investment banks have far more complex code than commercial banks—they need more in order to do all their clever, sneaky trading. Assuming the Times is referring to commercial banks, there is such variety among implementations and coding standards that speaking of an “average” amount of code is meaningless. Bank code written in FORTRAN will be far longer than bank code written in Python. Does it make a difference? Not really.

But while the numbers in the Times article don’t tell us much about the healthcare.gov codebase itself, they do tell us something about the “specialist” sources that inform the article. The sources are not programmers, because programmers would not speak in terms of lines of code with no further context. We hear that “disarray has distinguished the project” in part because government “officials modified hardware and software requirements for the exchange seven times.” The officials probably modified them 70 times—requirements for any software project are constantly in flux, and it’s expected that project managers and software engineers will adapt. Modifications alone do not signal a project in disarray.

We hear that the Centers for Medicare and Medicaid Services (CMS) lacked the expertise to link the individual pieces of healthcare.gov together. That does not explain why the “data hub”—the single component provided by Quality Software Services Inc.—proved to be “a particular source of trouble,” something I had surmised two weeks ago. If individual contractors were producing garbage, CMS’s expertise or lack thereof wouldn’t have made a difference to the final product.

The sources also say that CGI Federal, which won the $90 million contract to develop healthcare.gov’s back end, was asked to replace the data hub, though this approach was abandoned as “too risky.” That’s a hint that the article’s sources seem eager to shift the blame to CMS, to the White House, and to QSSI, and away from CGI. The Times claims that CGI was not responsible for healthcare.gov’s “integration,” but the Washington Post’s Lydia DePillis reports that CGI Federal was in fact responsible for “knitting all the pieces together, making Quality Software Services’ data hub work seamlessly with Development Seed’s sleek user interface and Oracle’s identity management software.”

I have no idea who the Times’ sources were, but they sure sound like employees of CGI Federal. Because they almost certainly aren’t programmers, I’d guess they are probably mid- or high-level managers who are trying to salvage CGI Federal’s reputation. They may well be “specialists,” but their specialty is more likely the art of procuring government contracts.

This is to be expected. What’s less expected is that such anonymous sources would be treated with this degree of credulity by national reporters who lack technical understanding of their subject matter and are thus more likely to parrot whatever a “specialist” tells them. The Times has a great tech reporter, Natasha Singer, who has done well-informed work on consumer profiling, taking little for granted. They should put her on this story.