I know in DNS, that each of the DNS servers will be tried to see if they will respond

I know in email that in the event of a failure it will go to the next one in the list or it will hold the mail for a period of time

As far as I know, in webservers, the browser will get one of the webserver IP addresses and try it and if it fails it will give up. Is this correct? If so, then the only way to direct traffic away from a failed IP address would be with the DNS servers and even that would not update immediately.

5 Answers
5

If you want no single points of failure at all, you need to do global server load balancing -- you obviously can't rely on a single datacentre, and even with a redundant BGP configuration, your BGP tables constitute a single point of failure that can be messed up if someone pushes a bad config.

What you do is configure DNS to advertise multiple IP addresses for the A record for your domain name, pointing at copies of your site that are in different datacentres (preferably in different cities), and the browser will pick one (usually at random, but watch out for Windows Vista which implements the stupid bits of RFC3484 and is thus not random), and will store the others. Depending on the browser, it will generally use one of the other addresses if the one it's using becomes unavailable. Your DNS servers have to continually monitor all of the sites and stop advertising any that go down. They also need very short TTLs. There are hardware solutions for this -- e.g. F5's BigIP devices.

You'll also need ways to replicate your database, your files, and your users' session states between the datacentres in realtime.

You'll obviously also need to get network diagrams and supplier lists from all of your ISPs to make sure that all your network routes are fully geographically diverse and the ISPs don't rely on the same upstream supplier. It's probably worth making sure they're not on the same power grid, as well.

Your failover won't be quite as fast as BGP failover, but you won't be able to bring your site entirely down with a single bad BGP config. You may mess up the configuration of a single DNS server or datacentre, but that won't bring you down completely (unless you push your DNS updates automatically to all of your DNS servers).

The GSLB link you provided is very helpful.
–
George BaileyJan 10 '11 at 15:33

RFC3484 is very long. Could you point me to the part you are referring to when you say watch out for Vista?
–
George BaileyJan 10 '11 at 15:43

2

Sure. It specifies that when picking from multiple IP addresses on an A record, the client should pick the address that shares the most prefix bits with its own IP address, rather than either taking them in the order presented or picking one at random. This sort of made sense for IPV6, at least as originally conceived, but it makes no sense at all for IPV4. And most Vista clients are running NATed on IP addresses in the 192.168.x.x range, so they will always pick your IP address that shares most prefix bits with that subnet, skewing your load onto one of your sites.
–
Mike ScottJan 10 '11 at 16:38

Thanks for pointing that out. When doing a nslookup on google.com I notice that all five results are the same except for the last few bits.. so they found a way to overcome that (at least for my region)
–
George BaileyJan 12 '11 at 14:53

On large websites, you might return multiple IP adresses for each hostname and have them ultimately pointing to load balancers doing MAC-forwarding to a cluster of webservers. Usually the load balancers themselves have a notion of takeover/failover as well.

Sure, but if one goes down.. and the client is using it.. then it would have to get a DNS update before trying another.. Is that correct?
–
George BaileyJan 9 '11 at 19:54

No, incorrect. The loadbalancers take over any failed IP immediately. The multiple IP addresses merely make sure not half your equipment is sitting idle. Some LB's may support tcp session resuming, some will kill the connection (requiring a transparent reconnect)
–
JorisJan 9 '11 at 21:13

I was referring to if one of the load balancers goes down then the client would need a DNS update.. But it turns out that having multiple IP addresses in the DNS record will cause the client to failover automatically.(@Joris)
–
George BaileyJan 10 '11 at 16:56

On the contrary, this is exactly what load-balancers are there for. We use a hardware load-balancer at work (from F5 networks). We have one IP address and the load-balancer forwards the connections to any of a number of web-servers behind it.

Ours are pretty nice since they share connection-states between them. So if the primary one dies, the other can pick up right where the other one left off and any existing connections stay up.

Application session-state is something that needs to be handled by the web-app, though. Once a person connects to the service, do they stay connected to the same actual server (i.e. session state is server-specific) or do they connect to any number of servers (i.e. session state is available to all servers, through a database or something). If state is not preserved across nodes, then when a server bounces the users attached to that server will have to re-establish state.

There's something called Network Load Balancing and VIP = Virtual IP. You create one VIP for 3 webservers and traffic goes to the working one. This of course depends on many factors but in Windows and IIS it's quite easy to enable NLB on multiple IIS servers so if one goes down the others are serving content.