HealthCare.Gov Fiasco

There will probably be consequences for delving into such a charged topic. But this has never stopped me before and I am too old to change my ways. So here goes. Many of us have of the problems with the online HealthCare.gov even though technical details are not yet available.

First let me say that I am not singling out this particular project, as severe problems are common in government IT projects, and I suspect the incidence rate may be no different in private industry.
Whereas other projects may hide failures with schedule slips or otherwise do not attract broad news coverage.
This project involves the general public, is a politically charged topic and had a hard deadline, so there is no escaping public scrutiny.

Some special aspects of this project are that it involves interaction with third-party health care insurers (and providers?) and the fact that each state in the US has their own health care laws and licensing? It has been reported that the complete system was not tested as a whole until one week before the public launch deadline of Oct 1. And that despite flaws found in late testing, the agency in charge decided that is was necessary to launch on the prescribed date regardless of known and possible issues instead of announcing a delay.

There are serious issues with the overall manner in which the government handles IT projects that seem to make either serious deficiency in meeting the objectives or outright failure the expected outcome. For now, I will comment on the patterns that I have seen in other government projects. A serious problem in all government projects I have been involved with, is the lack of understanding or appreciation of the different grades of technical talent.

The government has no problems in spending (I cannot call this investment as that would imply a reasonable expectation of positive return) hundreds of millions or in the billions of dollars on projects. (Note: the cost of these projects is not the subject of contention.)
I assume (but I do not know) that the top managers are compensated accordingly for large project responsibility. It is in the high technical positions where it seems that the government is unwilling to offer a proper pay scale.

A large project needs a large staff of people, be it permanent employees or contractors.
(per comment by Ryan below, large IT projects have the budget for a large staff, be it necessary or not. I was at a company years ago that did have large projects. It was acknowledged that the project could have been done better with fewer people if there was technology to clone certain people. But as such, it was done by a large team with a few seriously top people.)

The general expectation is that much of this staff should have several years of experience. It is not a requirement to have people with little experience. But if there is not, then there will be a shortage of experience personnel for future projects. The key is in having a number of very talented and experienced people to provide technical leadership, to ensure that the work of the staff is built on a solid foundation.

Of course top technical talent costs top dollar. Every time I get called on a government project, I hear that they cannot pay at a rate above that suitable for a senior developer. It is possible that there are government rules on how many high salary slots are allowed in a project and that these have already been assigned. It might also be possible that the government imposed such rules as a guard against fraud.

But without top talent, it is very likely that a high cost project created by a large team of middle to senior developers will have serious problems due to being built on questionable foundations. So it seems that our government would rather stick to their project management rules even though it means critical projects end up as very public failures.

Addendum 23 Oct
Consider the example of the military procurement system. There are so many messed up rules and other red tape meant to prevent fraud? or serve some forgotten purpose?
In the end, a project costs tens of billions of dollars, and by the time it is complete 10 to 20 years later, some combination of 1) it does not do something useful, 2) it is too expensive for what it does and 3) so much time has elapsed that the original requirements are no longer valid.

In the two majors interventions of the last decade plus, when it was brought up that the troops needed equipment not previously in the system, and that could be developed, Congress has the good sense to set aside most of the rules, saying just do it. The necessary stuff got done and done quickly. True some money was wasted, but it is also important the rules that are meant to prevent waste do not also defeat the ultimate objective.

Competitive bidding is a good idea for known entities.
I regard it as nearly totally stupid for IT development projects.
It might be better to set an annual budget for development, which should determine the size of the team, and a timeline.
In new projects, a prototype with moderate functionality should be up within 12 months if not sooner. In any case, something of value needs to reach production within 2 years or there is a risk of a project out of control, lacking cohesion.

I do believe that our federal government has an alternative,
as there should be several concurrent projects in development among the many agencies.
These should be judged against each other,
with successful contractors getting assigned more work.

Addendum 24 Oct
It is generally known that the high executives can regurgitate IT catch phrases on demand.
Something amusing quotes reported on Fox News covering the congressional inquiry:

Senior Vice President Cheryl Campbell also said in her prepared remarks that "no amount of testing" could have prevented the site's problem-plagued start.

... a system "this complex with so many concurrent users, it is not unusual to discover problems that need to be addressed once the software goes into a live production environment."

"No amount of testing within reasonable time limits can adequately replicate a live environment of this nature,"

In fact testing did find errors that were not fixed prior to release (of course testing was just before release.
So lesson for those aspiring to be executives: learn to make statements that are generally accepted to be true, preferably irrelevant to the actual root cause(s).

Optum/QSSI blamed in part a "late decision" to require customers to register before browsing for insurance, which could have helped overwhelm the registration system.

"This may have driven higher simultaneous usage of the registration system that wouldn't have occurred if consumers could window-shop anonymously," said Andy Slav

This is true. Still a modern server system can support thousands of connections.
It is important to run many web server instance/processes, so that each can be recycled without affecting too many users.
My own limited experience with web servers is that the session state system is more likely to be the source of a problem - requiring process restart. So if we did not implement session state, it is less likely we would need a restart in the first place.

Centers for Medicare and Medicaid Services (CMS) - under the Department of Health and Human Services is the government agency handling the HealthCare.gov web site/app. In this case, they acted as the project lead, as no contractor had project lead responsibility (and authority), so they are responsible for the top level decisions. CGI Federal handled most of the work. QSSI handled the registration element with identity management? They owned up for their part of the project problems.
CGI cited inadequate testing, started only 1-2 weeks before Oct 1 go-live, that it should have been months of testing. (Of course months of testing between code-feature complete and go-live is totally stupid. But it is correct to conduct months of testing during development.) And CGI did not mention this in September.

I do not find that the testimony given by the contractor executives at the congressional hearings provides meaningful insight or get to the key elements. In practice, the only way to get the whole truth and nothing but is under a no-blame inquiry.

What I would like know is what the original project plan schedule was. When was the date for code completion? the original testing period, and was there an allowance for bug fixes and retesting?

I think the test, bug fix, re-testing cycle should be about 4 weeks. Of course there will still be bugs remaining that are fixed after go-live, which is why the CGI testimony of months of pre-launch testing is full of sh!t.
It is perfectly reasonably to be fixing bugs found in testing as well as making minor performance fixes.
But if they had no idea of what the performance characteristics was going to be until post-code freeze testing, then that means the architects were incompetent, blindly building a project on stupid principles with no connection to performance, see my blog
load-test-manifesto

What almost always happens is that code-feature completion is late, and the project leads decide to compress the test-bug fix cycle to preserve the launch date. This never works because the test cycle was already tight to be with because they thought only 1 bug-fix retest cycle was necessary. Two might be workable.

Addendum 31 Oct

CBS News cites the cost of the website development at $118M and $56M for IT support.
Various states have collectively spent more than $1B on their own health care/insurance websites?

Health and Human Services (HHS) which oversees CMS, secretary Kathleen Sebelius testified that the contractors never asked to delay the sites launch, while CGI testified that there should have been months of testing. Marilyn Tavenner, head of CMS, testified that she had no idea there were problems prior to Oct 1?
even though internal testing prior to launch showed severe problems.
Lesson for those aspiring to be (government) executives: don't listen to the people who work for you, they can't have anything important to say.

Ms. Sebelius said to "Hold me accountable for the debacle". I am not a fan of firing someone every time a project fails. I have seen this before, and the next project still fails. Even Stalin eventually learned that shooting generals does not inspire the next general to victory.
Also HHS oversees several agencies, so I think the head of CMS should be accountable.

The Healthcare.gov people cited big numbers (4.7M) for the number of visitors in the first few days, but declined to state exactly how many people actually purchased health insurance. Ms Sebelius stated that the numbers were unreliable, and it will be mid-Nov before they are certain.
It now comes out that 6 people completed enrollment the first day and 248 by the end of the second day.

So I presume that HHS/CMS did in fact know exactly how many enrollments there were on each day but did not want this to become public knowledge, and lied about it.
Just yesterday I said that firing people for failure does not prevent the next failure.
However lying pretty much ensure the next project will be a failure.

It is being reported the tech "big guns" are being brought in to help fix the problem.
There is a Google person for reliability, even though Google did not provide software for this project.
Oracle is also sending people, and presumably this means that there is an Oracle back-end.
Let me say that I do not have technical issue with the core Oracle engine. It is indeed very sophisticated. There are also some extremely talent Oracle DBAs out there. Back about 12 years ago I noted that the best of the tech writings on Oracle were of greater depth than for SQL Server. (Since then, there has been good quantitative technical analysis on SQL Server.)

On the SQL Server side, there are many "accidental" DBAs. This might be a person foolishly inquiring as to who was the DBA for a particular SQL Server back-end. The boss thinks for a moment, and then says you are! That said, the accidental SQL Server DBAs are not afraid to admit they know very little about being a DBA and need to learn.
On the Oracle side, I have already said there is serious talent.
But there are also very many light weight DBAs.

Without making a generalization,
some of these think that because they are using a very sophisticated product,
then they are sophisticated too.
At the last Oracle World, six or seven years ago, one the top Oracle performance experts was explaining to the audience that an application making excessive hard parses (SQL compile) can scale. It is essential to have plan reuse.

On more than one occasion, I have seen an Oracle DBA who seem to be superficially versed in the latest Oracle big gun features, but absolutely clueless on fundamentals.
I was asked to assist in helping major hotel management company do an Oracle & SQL Server comparison for their new reservation system (being ported from Informix).
When I got there, everything indicated that the project team wanted Oracle and only wanted to show that Oracle was superior.
So they were using advanced Oracle features in the belief that this would give a decisive advantage. However their SQL coding was total crap. They used PL/SQL which may actually implement a hidden temp table with inserts, when in fact the query could have been written with straight SQL.
I had been told that this was supposed to be a like for like comparison, but since it was obvious this was not the case, I felt no obligation to point out their crappy SQL, while implementing correct SQL on my side (I also tightened up the data types to reduced table size).
They were shocked when SQL Server obliterated Oracle in performance tests.
They spent 2-3 weeks checking the individual query results to be sure I was returning identical rows. Only then did they finally bother to look at the SQL. In the end, with the Oracle system using correct SQL, the results were about the same, and they recommended Oracle.

Apparently outside consultants were hired in early 2013 to assess the HealthCare.Gov project.
It was assessed that there were serious issues?
but none of this was reported to Congress.
It might be accepted that the project team would put on a public face of all is well,
but failing to disclose the truth to a Congress should be a more serious matter?

Comment Notification

Comments

"technical details are not yet available" - Just viewing source in your browser will show you the quality of code. The backend is (clearly?) just as bad.

"A large project needs a large staff of people" - That is not necessarily true.

"top technical talent cost top dollar" - True and the government pays (at least to the contractors) top dollar for the "top talent" working on these projects. I'd be willing to bet CGI did not get less $100/hr for their "engineers" who built this mess. I'd also double down on that bet that they hired the cheapest people to perform the work (most likely outside the US in India).

At the end of the day – procurement is to blame. Work does not go to those best qualified but rather those who play the game the best.

Ryan: I don't know web-side code, is it javascript to make an assessment. I understand that there are both quality and volume issues.

Let me re-phrase, large government IT projects have budget for a large staff.

I am inclined to think that most people hired for this currently reside in the US, and that the development was done right here in the DC area? to be close to the managing agency?

I am inclined to think that typical software development pay scale is somewhat under $190 in the rate the contractor bills to the government, which allows for administrative overhead. I do think that the contractors pay decent rates for mid to senior level developers. Keep in mind that there is a difference in the rate paid to the developer depending on whether he/she is full-time with benefits or 1099. The government rate does seem to allow for a 1099 contractor to be paid $125, perhaps $145.

The problem is in paying for the deep stuff that can guarantee project success. I am talking in the 200-300 range. I think the governments objective in this is to create many good jobs. But paying for top talents means fewer good jobs, so they would rather have that than the ensure success.

Jimmy: The fact is a project that that the center piece of the presidents agenda was clearly handled incompetently on the IT side. What should we think about the actual product? The NR article by YL makes good points, but there is plenty of talent here. SV has its own share of IT f#ckups, but they are more on the f#cked up software/hardware vendor specialty. Given that this project was so important, why did he get good people in the first place.

ps - did you remember to include the linked to my article as a reference in your PASS slidedeck?

It is important to remember that this is just a web site. To people that have been rejected because of preexisting conditions the new law means much more than a web site ever could.

As a developer I've seen major commerce site without credit card encryption running for years that way. I've seen errors that threw away valid orders with no way to recover them. The growing pains with the site will be overcome (and they will likely eliminate the penalty for the coming year anyway - they won't say that now because it would be counter-productive.) The impact of the law on people's lives will be profound and long-lasting.

If a well-funded large project had serious issues at launch up until about 5-6 years ago, well that would just be an embarrassment. But between the capabilities of modern hardware, software development tools and the accumulation of knowledge over the years and previous projects, to screw-up a major project today is outright incompetence.

I make an allowance for the case that the more serious problems could have been fixed with a few months of additional work, but I would like to see a hard statement by one of the project leaders that the site launch should have been postponed. So far all I have gathered is that issues were raised prior to launch and that the project lead (CMS) decided to launch on schedule regardless. Did any one go on-record to postpone at that meeting prior to launch?

"Clay Johnson, the CEO of the Department of Better Technology, and Harper Reed, the former CTO of Obama for America, opined in the New York Times that not only are most large-scale IT projects failures but U.S. government ones are particularly so because they are bound to follow the Federal Acquisition Regulation. "More than 1,800 pages of legalese that all but ensure that the companies that win government contracts, like the ones put out to build HealthCare.gov, are those that can navigate the regulations best, but not necessarily do the best job."

I do agree that the government rules are so onerous and contribute nothing to the quality of work. My understanding is that CGI got a $678M contract in a no-bid situation. I do accept that there are situations that do not allow a several month bid-contest cycle. And of course bidding would not have contributed to the successful, just delays. Some have pointed out that CGI has connections to people in the white house.

The new fix-it man - Jeff Zients, has said the site problems will be mostly resolved by the end of November. I am assuming that his team has not made an in-depth assessment, so this must be based on discussions with the current team and that he has confidence in some of the current people. QSSI will be point on the fix, not CGI, but I would not read much from press reports.