Archive for the ‘Networking’ Category

One of our hosting providers, Linode, co-locate their servers in Hurricane Electric‘s (HE) Fremont Data center. Early today HE got hit by a massive (no other word really to describe it) DDoS.. it started at around 1:30AM GMT+8 and ended around 10AM (at least for us, with a couple of servers affected). Linode posted this update from HE:

On October 3rd we experienced a large attack against multiple core routers on a scale and in ways not previously done against us. We had various forms of attack mitigation already in place, we have added more. It was all fixable in the end, just the size and number of routers getting attacked and the figuring out what attacks were doing what to what took some time. The attack mitigation techniques we’ve added will be left in place. We are continuing to add additional layers of security to increase the resiliency of the network.

Because the attackers were changing their methods and watching how their attacks were responded to, we are not at liberty to elaborate on the nature of the security precautions taken.

This attack is interesting for a couple of reasons:

The core routers are the target. A typical DDoS usually targets a specific domain or service, by targeting the routers of HE the impact of the attack is broader ie it affected all the customers of the Data Center. When Amazon was attacked, the users hardly felt any degradation in peformance. That’s because the attack was against a domain and we already know that amazon has thousands of load balanced servers which regularly takes on the load of last minute shopping. This one was different, instead of attacking the servers, they attacked the core routers and router switches which act as ‘gateways’ to the load balancers, firewalls and servers. A core or edge router provides gateway routing and connectivity to dozens of other routers and possibly thousands of servers to the rest of the Internet, shut that router down and you’ve effectively made those thousands of servers inaccessible. The attack targeted “multiple” core routers at HE.

The attack was successful. New generation routers usually have built in anti-DoS features already, the fact that those where all overwhelmed means that a) the volume is simply too massive — its not really difficult to congest a pipe — and or b) a protocol exploit that used up a lot of CPU was used — e.g. BGP is frequently a target.

The attack was dynamic. HE mentioned that the attackers were changing their methods in real time and watching how their attacks are being responded to. Obviously, they’re not dealing with script kiddies here.

I could think of several scenarios on why somebody would do this (conspiracy hat firmly in place):

its a red herring — there really was a target– hosted by HE or one of its customers but the perpetrators wish to hide that fact or;

somebody has an ax to grind with HE.. could be a disgruntled network engineer, it can happen or;

its a proof-of-concept test — now this is a real concern. Obviously, the attackers have figured out a way to execute the attack dynamically and massively and considering that it took a jaded and arguably one of the most experienced data center operators almost 12 hours to stop the attack means that something new was done. One could argue that the reason why the attack stopped was not because HE was able to apply or adapt to the attack patterns (remember HE said that it was evolving), it could be that the attackers decided simply to stop ie they could have continued if they wanted and HE would have found a new attack pattern to apply rules against.

Whoever it is, and we’ll probably never know who he/she/they are, it is a very real major concern, specially if you’re in the business of hosting and service provisioning online. Unfortunately, if this happened to HE this can happen to anybody.

update 2:48PM GMT+8: well it looks like the attacker(s) simply went out to grab dinner.. linode is reporting that they’re experiencing another ‘stability issues’ with their Fremont (ie HE) ‘upstream’.

update 3:31PM GMT+8: network has stabilized ‘again’ according to linode.. just a quick clarification, apparently, its not only HE’s Fremont facility that was affected but their NY DC as well.. take no prisoners approach I see.

update 6:43PM GMT+8:apparently, a similar attack, albeit a limited one, happened to HE a week ago. Just a probe then..today was D-Day.

On September 28, 2011 10:20pm PDT and September 29, 2011 11:45am PDT, the Fremont 1 datacenter was subject to a DDOS targeting a core router. The attack caused OSPF and BGP reloads resulting in elevated CPU utilization and performance degradation of the router.

The incident on September 28, 2011 10:20pm PDT was identified and mitigated at 10:40pm PDT. The incident on September 29, 2011 11:45am was identified and partial mitigation was realized shortly thereafter with full containment at approximately 12:45pm PDT. All systems are fully operational at this time. We have already been in contact with the router vendor, and have obtained a new software image that addresses this type of infrastructure attack. We will be deploying the new image shortly. A maintenance notification will be sent out separately regarding this emergency maintenance.

Amazon just launched its tablet, the Kindle Fire. Aside from the price (its only $199, less than half of an ipad 2) one of the most interesting feature is their browser called Amazon Silk. The browser basically off-loads the heavy lifting of rendering and image optimization to their huge proxy/rendering farm (courtesy of AWS). The result is snappier pages and happier users.

At least thats the idea. Theres no doubt that infrastructure-wise this will work as ISPs have done this at some point to save bandwidth and improve user experience (squid being the most popular open source cache/proxy).

However, it seems like amazons engineers have pushed caching to the next level by rendering CPU-hogging javascripts and optimizing content (image resizing mainly) prior to delivery to the kindle Silk browser. So far so good.

Now for the privacy questions: how can amazon guarantee a) protection and b) anonymity of the session information and most importantly the data (eg username anf passwords) that will be “proxied” by the servers?

How will the browser deal with https traffic? Will that also be optimized too? (ie go through their servers)? I hope not!

That being said im looking forward to the getting my hands on them fondleslabs =)

Since about 1AM PDT AWS US-East’s EBS service has been down. It’s been 24 hours now and many people are getting mighty antsy about this disaster, which according to their status site is caused by “networking event early this morning triggered a large amount of re-mirroring of EBS volumes in US-EAST-1. This re-mirroring created a shortage of capacity in one of the US-EAST-1 Availability Zones, which impacted new EBS volume creation as well as the pace with which we could re-mirror and recover affected EBS volumes.“. The irony is, the automated backup activity is bringing down the entire EBS infrastructure and EC2 (those instances that depend on EBS anyway — which most probably do) in the availability zone.

Finally, Oracle has finalized the purchase of Sun..frankly despite the assurances of Oracle, this whole thing is making me nervous about the future of these two technologies. Ok so they’ve open sourced Java however Sun (or now Oracle) still has a say on what goes in in the final versions.. some people say that its in the interest of Oracle to continue the development of the Java programming language and its API’s because a big chunk of its middleware is written in Java… what worries me though (and believe me I’m not the worrying type) is the licensing part and whether Oracle will continue the path of open sourcing Java? Oracle is not exactly the poster boy of open source.

while Oracle will continue to support MySQL in the foreseeable future and even agree to not release a new version of MySQL unless it also releases (in parallel I suppose) the community version.. it will only commit to do so only..

“..until the fifth anniversary of the closing of the transaction“.

that closing happened on the 28th of January 2010, therefore it is reasonable to expect that after January 28 2015, Oracle can do whatever it wants with the licensing of MySQL. My guess is, the strategy is to put some kind of doubt into the future of MySQL making people think twice when choosing between open source (MySQL is the de facto afterall) or getting a commercial alternative. Companies do not like uncertainty, chances are CFO’s and CTO’s would find this ‘uncertainty factor’ worrying enough to instead buy a commercial DB rather than using OSS. So nobody can really claim they’re out to kill MySQL (they promised the EU Commission afterall) but in the same breath they say we cant really say for sure what happens after 2015.

Running an ISP teaches you a thing or two about networks.. and one of those things is we dont what to be running it forever!

With the (independent) ISP business dead and as we transition from dialup (and some broadband) to mainly VAS we’ve seen our users relying less on our connectivity (and getting somebody elses) while maintaining their hosting and VAS with us. Obviously we need to trim down the infra needed to support dialup and broadband and began looking for a more cost-effective way to provide VAS.

In Jan 2009 we began to seriously search for alternatives.. we considered the traditional hosting as well as the ‘cloud’ players Amazon AWS and GoGrid.

here’s how they compare:

COMPARISON

Monthly Parameters

Change Values

No of CPU’s/Instances

9.00

<-based on required cores

2

offset

Memory Per CPU

2.00

GB (2GB minimum)

Aggregate Bandwidth In

120.00

GB (high end estimated)

Aggregate Bandwidth Out

120.00

GB (high end estimated)

Aggregate Persistence Storage Needed

357.00

GB

CLOUD COMPUTING

TRADITIONAL DEDICATED HOSTING

PROVIDER COMPARISON

AMAZON

GOGRID

MOSSO

IWEB

XLHOST

A. Instance(s) Monthly Fees

Effective Charge For CPU’s/Instance (PAYGO)

$669.60

$2,544.48

$803.52

$693.00

$658.00

Effective Charge For CPU’s/Instance (CHEAPEST OPTION)

$325.88

$1,116.00

$803.52

$693.00

$658.00

B. Bandwidth Utilization Fees

Inbound (per GB)

$12.00

$0.00

$9.60

$9.60

$9.60

Outbound (per GB)

$20.40

$20.40

$26.40

$26.40

$26.40

C. Persistence Storage

Disk space Cost

$35.70

$53.55

$53.55

$0.75

$0.84

D. Service Features

Root Access

Yes

Yes

Yes

Yes

Yes

Online Server Deployment

Yes

Yes

Yes

No

No

Free intra-cloud traffic

Yes

Yes

Yes

No

No

Free Public IP’s

Yes

Yes

Yes

Yes

Yes

Online Firewall Management

Yes

No

?

No

No

Load Balancing

TBA

Free

?

No

No

DNS Management

None

Free

?

Yes

Yes

Management API’s (for future automation)

Yes

Yes

?

None

None

E. Expected Total Monthly Expense

PAYGO

$737.70

$2,618.43

$893.07

$729.75

$694.84

CHEAPEST OPTION (requires advanced payments)

$393.98

$1,189.95

$893.07

$729.75

$694.84

Amazon’s AWS, aside from being the cheapest also seems to be the most mature with servers (aka Instances) being activated on the fly (as opposed to waiting for a day or so for the regular hosting providers) and good set of integration API’s.

Btw, I did try Gogrid, activated an instance and deactivated it (or I thought I did) and got a bill that basically told me that my cost comparison is correct: contrary to their claims they are more expensive. That ends my short experience with GoGrid.