China, Amsterdam, San Jose and global load balancing

The algorithm first checks to see if there’s a match in the static maps and then falls back to proximity metrics. If that’s missing, it’ll round-robin through all the GSLB sites (effectively three – San Jose, Amsterdam and China – though technically it’s six and actually it’ll only round-robin through sites that have that service/site defined).

In practice, this round-robin fallback hasn’t been terribly user impacting mostly. You’d only ever hit the round-robin fallback if no one in your network (/16 in this case, but based on the source address of whatever name server you used) had ever accessed any of Mozilla’s web properties. And while that’s possible, the performance his is really minimal. But then up until now, we’ve only had two GSLB sites to send users to and San Jose and Amsterdam are well connected.

China changes things. Connectivity to mainland China can often be congested and can induce a lot of latency (which matters less for downloads than it does for interactive sites). If you’re in New York and fall through to round-robin, you very likely don’t want to end up taking 80ms to the west coast and another 300ms-400ms to China. This is a bad user experience.

If you don’t believe me, look at my trip times from San Jose to some Mozilla gear in China:

Since December I’ve been working on ways to exclude China from the fallback round-robin method while making sure our Chinese users don’t goto San Jose or Amsterdam. The latter was easier – using MaxMind’s GeoLite database I’ve built a static proximity location map assigning all IP addresses in China to the China datacenter. The former, excluding China from round-robin, took longer to figure out.

One of the GSLB methods the Netscalers have is a “weighted round-robin” and while it’s not documented, it turns out that weights apply even to the fallback round-robin method. Weights range from 1 to 100. So now I can do something like:

which will send 200 requests to either San Jose or Amsterdam before sending -1- request to China.

It’ll be interesting to see what the actual user impact of this is and if anyone not in mainland China ends up hitting China. We’ll be testing this for a week, starting next Tuesday, with www.mozilla.com and looking at the web server logs to see where users are coming from.

I’d also be interested in feedback from you if you find yourself in China and you should be elsewhere or your find yourself in China and www.mozilla.com is quicker for you!

[1]GSLB works by looking up the IP address of the host that made a DNS request to somehost.glb.mozilla.com and determining which data center is closest. This is a fairly common method to do global load balancing.