Tips and tricks for modern system administrator

history

I’m a big fan of automation. Probably all good devops are, and maybe a bunch of old-and-good sysadmins too. But everything in the world has limits. Couple of days ago I had discussed with a colleage the convenient or not of total automation of systems actions.

While for some repetitive and non-critical work I’m agree that automation is a good thing, for some relevant actions or critical ones I’m more conservative. Our discussions was about the automatic failover in database (which is the core of the company and eventually critical) or keeping the manual failover, even if this failover implies to wake up early during the night.

Today, I was to share not very famous story, which happened in 1983, but which explain quite well why some actions must be manual. Let’s start with the Petrov story.

This is September 26, in 1983. The world is quite nervous these days. The Soviet Union recently take down a korean civil plane for flying over russian sky and 260 people were killed. The NATO started military maneuvers near the Soviet Union border.

The NORAD as see in War Games film

In this climate, the Lieutenant Colonel Stanislav Yevgrafovich Petrov (Станисла́в Евгра́фович Петро́в) started his turn of duty that night. He was the responsible to control the soviet detection system OKB, which is the automatic system responsible to detect nuclear releases from american bases, and, in case of that launch is detected, run the counter-attack (which means start a nuclear war in that age).

In one moment during this night, the main panels in the wall started to blinking, the alarm start to sound and the system confirms a launch from one american base. Now Petrov needs to make a decision. Or wait to be sure that the launch is real or run the counter attack. But he has no much time. If an attack is on the way, every second count to organize the defense and the response. If not, he probably would be starting the last war for the humanity.

A couple of minutes ago, the system started to buzz again. Other launch was detected, and in few minutes other three launches were detected again. In total five nuclear missiles appears to be launched from the US against the Soviet Union. The OKB system was very reliable, it uses spy cameras in satellite constellation, orbiting without problems for years. The data is analyzed in three separate data centers, without any common element.

From a technical point of view, the threat was real. The satellites (very sophisiticated piece of technology for that age) detect five missile launch (not only one), and three isolated data centers come to the same conclusion. Nothing appears to be wrong in the system… so… what is the logical reaction?

In that point, the human common sense enters in the ecuation. Petrov thought (and keep in your mind that probably this thinking had avoided a nuclear war): “no one start a war with only five missiles”. So he decided not to run the counter attack, not to inform the high military generals and just wait until a radar in the Soviet Union border can confirm o disprove what the systems said.

Fortunately for all of us, Petrov was right. No one start a war with five missiles. The border radars confirmed that there was no missile at all, and everything was a terrible mistake of the system. Specifically, a weird align of the Sun, the satellite and the Earth, provokes that a reflection in some lower clouds appears for the system exactly like a launch does, and this reflection just appear over a military base which, in fact, was nuclear missile silos.

Stanislav Yevgráfovich Petrov in uniform. Credits to unknown author.

The point here is: what happened if this decision was made by an automatic process? Probably neither you and me would not be here now.

By the way, Petrov received an award for his help to maintain the peace in 2004, and a prize of (sic) 1000 US dollars. His family died without knowing what really happened this day because the incident was categorized as top secret for years.