Posted
by
CmdrTaco
on Wednesday May 25, 2005 @12:36PM
from the go-ahead-and-replicate-the-meme dept.

infofarmer writes "Today at about 11:30 MSD (GMT+4) a major electricity outage in Moscow, Russia brought new meanings to words like "uninterruptible", "redundant" and "uptime" for network administrators, who haven't experienced such harsh and unexpected power failures since the USSR got its Internet connection. Half of the city is totally out of electricity - including subway and the most important traffic exchange point, half of the top russian sites went down, including www.mail.ru, www.rambler.ru, www.lenta.ru, some of them haven't been brought up yet. IP packets going from ADSL users in Moscow to some local sites got rerouted to somewhere in London and then back to Scandinavia, where they met their "No route to host" deadend. Other routers found themselves in a loopback, which made many packets get dropped with TTL expired. The point is that most of popular servers have got two or three mainline Internet connections, but lack of BGP/RIP2/whatever configuration resulted in packets losing their way to hosts."

I too delete mail from spam folder on every occasion. My reason is to catch false positives as I've had some (russian+finnish mails). Removing spam regularly is only way to keep up with 'seen those' -level. (You could 'select all' and 'mark read', but with same work one could just delete them, which I do.)

Well, the wiki on my site was continually being probled with vandalism attempts by various machines around the world for the past couple of days, and it stopped dead right around the time of the power failure.

So... no prizes for guessing where the control machines for the botnet were.

And I'll throw in the other unPC question: How does the past infastructure designed/built/maintained by commie thugs compare to the present one designed/built/maintained by the russian mafia ones? Just curious.

Well, I've never noticed my DSL going down at the same time as a power outage (although it does go down at other times more often than I'd like). So yeah, I would guess that the ISP's routers do have some variety of backup power.

Phone companies usually have everything on redudant and battery backed power supplies. Much of this practice comes from the phone being the primary method of communicating in emergencies. If the phone system goes down, people may die. As far as I know, DSL subsystems tend to be integrated into the same emergency systems. (This may have to do with the rise in packet-switched networking popularity for POTS.)

ok so you could put dsl hardware on non backed up power but doing so would add a lot of wiring complexity (two seperate power systems would be a lot of installation) may cause complaints from users who do have backup power for thier end and could cause major problems if someone hooked up important kit up to the grid only power system.

Knowing how things are done in Russia, you should be a lot more concerned with things OTHER than Internet.. Everything is such a fucking mess over there, that's I really hope no serious injuries happen. I already read the news that sewer water is being dumped into the Moscow river because of a plant failure. In times like these, who gives a shit about Internet?

If that's true, whomever designed those systems deserves to be punished severely. Why would you put all your eggs in a basket you don't own?

The same can be said of the electrical grid. And the cellular network. And the water network. And the sewage system. Or the public road infrastructure. Or the food distribution chain. Face it - virtually every aspect of modern life requires you to rely almost completely on infrastructures that you do not own.

I already read the news that sewer water is being dumped into the Moscow river because of a plant failure.

This is what is supposed to happen. All (nearly all?) sewage treatment plants have a bypass to send the input straight to the output, which is usually a river or lake.

They do it because when a treatment plant cannot accept any more sewage, whether due to excessive water input by rain, or by power loss, the customers are better served by *NOT* letting the sewage back up into their houses. The stuff has to go *somewhere* when all their holding tanks are full. This is the last-resort method of dealing with problems at such plants.

Yes, but slashdot is concerned with the internet, and so this is an appropriate forum to discuss how an event like this affects the internet. I don't think someone who runs an ISP in Russia should be trying to figure out how to get the sewer working, they should be figuring out how to get the internet up.

Nobody cares that this was in russia, that people can't get their email, or that it was because of a power outage.

The reason this is on ther front page is because the internet is suposed to be able to handle things like this. People will be watching how the routers automatically deal with the outage (there's one response like that already), and what manual intervention it needed. Hopefully this information will be used for training the next generation of router admins.

Why, did cars stop working? Did elevators break and drop to the ground? Is that sewage somehow worse than everything else that gets dumped into the rivers in Russia? Biological waste can (and will) be broken down very quickly by the life in the river, but all of the heavy metals and other pollutants that get dumpes aren't the same...

I've used it before with a VISA card, using a special service to generate temporary numbers with custom credit limits. A good free service my internet bank provides, and useful to trial "shady" sites. But I'm happy to say all transactions were just fine over a period of a few months of usage at least.

It was interesting that news.google.com, cnn.com, msnbc.com, etc. do not have this story on its front news page. I guess the outage isn't severe like one in New York a few years ago.

Well, first off, you're factually incorrect. The outage 2 years ago affected a large area of the Eastern United States as well as some areas of Canada, not just NYC. Furthermore, the sources you cite are all american sources. It's no surprise that they tend to report american events more than world events.

Well I heard about this on the libération.fr a french paper. Local (for me, here in Rio de Janeiro) paper had nothing on this also, but they did anounced a much smaller (then the NY and other places) outage that happend in london.

Some coutries are more inclined to get the meaning of international then others. Here in Brazil international mostly means USA and Europe, we don't even get to know what happens in Argentina or Chile, if it does not involve Brazil in some manner.

I live in Russia, about 1000 km from Moscow. We were hit by network outage, nothing worked (even Slashdot:( ) for about 30 minutes. Number of routes announced by both of our peers was about 700 instead of normal 150000.

But then routes began to appear again! I was amazed, Internet routed itself around damaged segments, packets were routed through Japan (!), Finland and Holland instead of Moscow. The most funny part was when I traced the route to a computer in the next building - it went through Saint-Petersburg:)

TCP/IP and the Internet anticipate cooperation among sites. You and your neighbors should all happily route each other's packets.

The trouble is that in many places it doesn't work that way. There are rural "leaf" nodes, of course, but there are many more sites which have only one connection because of what I consider to be petty business decisions.

Two competing ISPs in the same area should share a direct link to each other. If they have different upstream providers, then when one provider goes down the other picks up the slack. In any case local traffic should stay local.

The fear, of course, is that one ISP will choose a bad provider and take advantage of the other. That has an easy fix: if the other one starts to abuse you, pull the plug.

Good ones have UPS battery backup that'll keep things up for several minutes, typically. That's enough time for the generators to start up. Fuel supply then must last the full outage or refueling trucks will have to start rolling.

Also, remember that even with those precautions, a failure in any UPS or generator anywhere between the user and the servers can result in an outage. That includes telco equipment in other buildings as well.

There have been several high profile datacenter power outages in big U

There's a Russian politician of Yeltsin era, Anatoly Chubais who is in charge of RAO UES Russia (which is an uber-organization controlling production and distribution of energy in Russia).

While the guy is not as powerful as he was a few years ago, he still poses a significant threat to Putin's third (and fourth, and so on) term presidency, and further concentration of power in Putin's hands.

So within half a few hours of outage, Putin blamed Chubais directly for this, and Russian justice dept opened up a criminal case against him. If you know anything about Russia, you know that Russian DOJ (Prokuratura) doesn't start criminal cases against wealthy and powerfull businessmen and politicians unless instructed to do so by Putin.

So I'd bet dollars against donuts that this outage was caused by folks from Lubyanka (FSB aka KGB) purely to remove Chubais, and if cards play well maybe even give him a lengthy prison term.

I'm sure Putin will exploit the power outage to weaken and possibly get rid of Chubais.

Whether the FSB caused the outage directly, to prompt an attack on Chubais is another matter. Maybe they were working on a plan but it wasn't ready yet. They have a lot to do:)Even Putin sometimes just exploits opportunities.

One thing is what he "states", and another is what he really does. There's very little in common between Putin's speeches and the stuff he does after giving speeches.

Consider the "dictature of law" doctrine that he so vehemently supported when he came into office. For some reason, dictature of law applies only to his (or his buddy oligrachs') political opponents. Khodorkovsky was an opponent, Chubais is an opponent, Berezovsky, Gusinsky, you can continue this list yourself. Abramovich, Deripaska, Mamut, et

I doubt it was the lack of RIP2 configuration that caused this. You don't use RIP in the core, you use BGP as the exterior protocol and most likely OSPF or ISIS as the interior protocol.

UPS: at least in one place in MSK-IX they did have proper UPS backups, you can tell from routing tables that some BGP connections have an uptime of 4 weeks plus. They did bounce (or it had a power failure) one of their core routers as all those peering connections only have an uptime of 8.5 hours. I'd rather not provide a link to this as the last thing they need is their core routers slashdotted with BGP table summary requests.

Connectivity: it appears MSK-IX is peered with at least 12 other sites that are also peered with another major IX. For example they are connected to three other sites that are also connected to AMS-IX and four other sites that are also peered with LINX, among a few others with only 1 connection to another Internet Exchange. Many of these were thru Informtelecom XXI, so if they also had power problems everything was running on 50% normal capacity. There should have been enough connections to keep things running (i.e. no single point of failure), but that is assuming everything is working/powered, and assuming these guys in the middle could/would handle all the traffic (unlikely).

BTW, packets don't lose thir way, routers lose their routes to destinations. When all the crap started the routes began to "flap", i.e. go up and down as routers were reset, power came back on, routers went back down under the heavy load, manually trying to route around the problem, etc. When your peer sees your routes flapping, they usually put a holddown on them for a period of time, meaning they won't readvertise your route updates to other routers on the internet (said flaps propogate all over the world, putting undue stress on other routers). So even once you get everything working again, the internet waits for a little bit to accept your routes. Well, some do and some don't or some wait longer. That's why you see routers still forwarding packets to London, apparently London thinks it can still get to Moscow so it's still advertising routes. You don't get the count to infinity problem with BGP, but loops are still possible, especially during major outages and route flapping. And routers get "routing loops," not "found themselves in a loopback."

I provided as much details as I could, it's lacking in a few places because I can't follow russian websites.