Rackspace CTO Engates Analyzes HealthCare.gov Meltdown

John Engates went to the White House Monday to get a closer look at what went wrong with HealthCare.gov. Unlike actor Jimmy Stewart's Mr. Smith, Engates came away from Washington saying this problem can be fixed.

Rackspace CTO John Engates went to the White House Monday to get a closer look at what went wrong with HealthCare.gov. What he saw convinced him that Big Government doesn't operate the way enterprises do -- and maybe it should.

Engates was one of the few technology industry spokespersons to step forward when the controversy over HealthCare.gov broke out. He was quoted, among other places, on the front page of USA Today, commenting that the site appeared not to have government repositories that could keep up with its needs.

Engates doesn't fit the stereotype of a business-oriented government critic. "I was very proud and excited to be invited," he said in an interview after returning to his home base in San Antonio, Texas. At the same time, he hasn't hesitated to describe the launch of the administration's public enrollment website as "one of the most spectacular public failures ever."

While Engates has plenty of criticism for HealthCare.gov's troubled start, the visit to Washington bolstered his confidence that the site is now on track. Indeed, the new Healthcare.gov project lead Jeffrey Zients to tell the press six days after Engate's visit that the site's error rate in loading pages is down from 6% to just 1%. It is also now able to serve 50,000 people at a time and takes one second (compared to eight) to load most pages.

With coverage under Obamacare scheduled to begin in 30 days, however, these fixes may not be enough. According to Engates, a more persistent problem than response time involves connecting new healthcare signees to their policies at specific providers. A fix is still in the works.

Only a handful of executives joined Engates on his visit, including representatives from Salesforce.com, Exact Target (a recent Salesforce.com acquisition), and IBM. The group included at least one friend of the administration; when President Obama visited San Francisco on a fundraising tour, one of his stops was at the residence of Salesforce.com CEO Marc Benioff.

In addition to meeting with Zients, the group met with federal government CTO Todd Park, federal government CIO Steven VanRoekel, and White House chief of staff Denis McDonough. After an hour-long briefing in the White House situation room, the group was taken to the Maryland offices of QSSI, the contractor that was hired to pull together the site's disparate elements. There, Engates said, he saw "an exchange operations center where all reactions to the crisis can be orchestrated."

For one thing, he explained, there's a "quarterback" in charge who can make decisions over the subcontractors involved in the site, each of whom has a representative in the operations center 24 hours a day. "Before, you had people working nine to five, five days a week -- if you were lucky," Engates noted. "Now they're working 16 to 17 hours a day, with the center fully staffed seven days a week."

"They have standup meetings twice a day," Engates continued, in which participants report on anything that goes wrong. "In the government contracting system, people weren't in a single room [like the operations center]. No one was willing to ring the alarm bell."

Engates discussed with VanRoekel how private business keeps contractors honest. "The bidders with all the right characters get the contracts, but they're not the best ones for the job... He was interested in reforming that," said Engates.

Government officials were straightforward about what had gone wrong with the site, and Engates even made a suggestion or two of his own on how to improve it. "Don't log in and update servers one at a time," he advised. Rather, he recommended pushing updates into production using a blueprint updating large groups of servers, the way Rackspace Cloud and other large scale vendors do. Likewise, he also suggested monitoring more things related to the application's performance and taking action earlier when monitoring shows completed actions heading south.

Unlike Jimmy Stewart's Mr. Smith, Engates believes not only that the problems can be corrected, but that the government is already halfway down that path. "It's also fixable. The accountability and intensity have changed from what they were before the launch," he said. Previously, contractors didn't seem to see how their work might appear in the public eye. Now they know they are under scrutiny and each piece of work is critical.

The crucible of cloud, big data, and distributed computing is hell on systems. Will application performance management cool down complexity -- or just add fuel to the fire? Also in the new, all-digital APM Under Fire special issue of InformationWeek:Cloud industry heavyweights discuss the pros and cons of OpenStack support for Amazon APIs. (Free registration required.)

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.

The fact that there were 400 issues needing immediate fixing is one clear indication that the site was not ready to be released, and that the managment disciplines were clearly not where they needed to be.

There's a couple of big differences between this government IT failure and a big enterprise IT failure. Lack of accountability--which heads rolled? And the sheer magnitude of the failure: the website supporting the signature effort of the Obama administration, a site under development for more than three years, didn't work. Imagine if Amazon.com had spent three years building a site for the 2013 holiday season and it simply didn't work.

From techical perspective, I don't think there is any concrete difference between federal government and big enterprise. The same kind of mistake can happen with government project as well. The scalability is a tough issue since it mainly depends on your architecture - is it flexible enough to acccommodate future growth?

The story I have coming tomorrow, from the perspective of a couple of web scalability experts, suggests that the planning failures of HealthCare.gov are not so different from those businesses often make when they run into scalability snafus.

It's hard not to be impressed with a White House showing, especailly with quality players like Jeff Zients, Todd Park, Steven VanRoekel, and Denis McDonough. But what the federal government still does poorly is properly invest in a team of top-level, highly qualified program management leaders at the outset for programs like this before they got started, not after a project needs fixing.

Rackspace CTO John Engates made his comments on Wednesday, before Thanksgiving. HealthCare.gov "quarterback" Jeffrey Zients confirmed his analysis of what's wrong and seemed to have taken some of Engates advice when he spoke to the press on Sunday, after Thanksgiving, citing "hundreds" of bug fixes.