Thursday, September 15, 2016

I told this story to a colleague this week, and it seemed worthy of a blog post.

Several years ago I worked for a large healthcare system. We supported more than 40 hospitals in seven states. I worked for the integration team and we developed and managed interfaces between all of their different IT systems.

They used an older mainframe based Patient Management System and a newer EMR. We picked up patient administration messages (ADT) and delivered them to the EHR and various other ancillary systems. We had an audit report that told us when a message took more than one minute to flow from the Patient Management System to the EHR, and we would have to explain why that happened. This was considered an SLA failure. One month we had a particularly high SLA failure report. As a junior developer I drew the short stick and had to investigate these failures.

I could see that we were receiving the ADTs and that they were going out to the EMR in a few seconds, well under the sixty seconds that we were permitted. I scratched my head and at the next status meeting I asked the innocent question "How are these times established." It was explained to me that these were the "system times" in the Patient Management System, the Interface Engine and the EHR. So I then asked the follow up question, "How do we know that the times are in sync?" What followed would be referred to in literature as a "pregnant pause." Since I asked the question, I was tasked with finding the answer.

The Interface Engine and the EHR ran on unix boxes. These machines synced their time up with a time server at the US Naval Observatory every weekend. So, when I sat down with the EHR tech we verified that the times were in sync. When I sat down with the Patient Management System mainframe tech and we looked at the time in his system, it was a full 40 seconds off from the other two systems. Since that time "started the clock" for SLA purposes, this was an issue.

So I asked him, "How is the time set on your system?" The answer that I got was shocking.

"Well, when we do the quarterly Initial Program Load, whatever tech is performing the reboot looks at his watch and sets the system time to that."

This was a multi-billion dollar organization, and that is how they managed time. I was dumbfounded.

So, I asked, "How should we fix this?"

"Fix what?"

"How do we get the time in the mainframe to be set correctly?"

"The mainframe time is correct."

"Not according to the time servers at the US Naval Observatory."

"Yeah but....."

The mainframe was the center of the universe. The mainframe folks considered that the time in the mainframe was more correct than the US Naval Observatory.

Eventually we agreed upon a procedural fix which would require the tech that was setting the time during the quarterly reboot to bring up the web page of the US Naval Observatory and set the mainframe's time correctly.