I believe it's a common requirement of MMOs that processing for a single shard or realm can be done over several servers to ease the load. I'm curious as to how this can be done whilst maintaining a unified consistent world where all the players, and all the NPCs can interact.

My question is how is load balancing achieved in MMOs?

Any links, books or general information on how to improve my knowledge on this subject is also appreciated.

4 Answers
4

Try to keep this as simple as possible and interfaces well defined and documented. Maintaining and debugging a complex system in production easily turns into hell. So if there is a simple and a complex approach, think twice before you go with the complex one.

Then for each of these services decided if the client may talk to them directly. For example it is pretty easy to let the client talk directly with the servers responsible for Global Chat Channels. The world servers don't have to be involved in chat messages at all. Regional Chat can be implemented in the same way, but the world servers have to tell the chat servers when players change regions. Again, they don't have to care about the messages.

The third step is to think about load balancing within a service. For example global and regional chat channels can be split across multiple servers based on their name. It is probably a good idea to not hard code this split into the client, but provide a lookup service.

World Servers

The most difficult part are usually the world servers, so I am starting with a simple approach. It is probably a good idea to let the client talk directly to the server responsible for the region he is in. So on login or region crossing the client has to be told to which server to connect to.

The simple approach is to split the world into independent regions. With independent regions I mean that a player cannot look from one part into another and monsters cannot cross parts. Those regions are different from the regions player see based on the landscape and story of the outside world. Usually most monsters are in dungeons and players tend to accept that they have to walk through a gateway to enter a dungeon. Especially if those dungeons are instantiated on a per player group basis. Other examples on the outside world are different continents and valleys enclosed by high mountains.

A continuous world approach gets complex really quickly, so it makes sense to plan it well: What information does the client need? Which information do the servers have to share? The player will mostly interact only with the objects (including monsters and NPCs) in the same region. You can cheat by placing objects out of click range from the zone border. This means that the client is mostly interested in read only information for neighboring zones. For these cases the zone servers don't have to coordinate anything except for the permission check that the player is close enough to connect to a neighboring zone.

This leaves only a very small number of difficult cases in which objects or actions have to cross a server border. Which is a good thing because those cases such as arrows and spells are performance critical. It may be a good idea to split combat into attacking and defending. So the server of a spell-caster will define the attack parameters including the position of the caster. The server of the defender will get the message about the attack and calculate the impact. The server of the attacker does not need to know about the impact; the client will learn about it using his read only connection.

Depending on how complex your player model is, it may take a couple of seconds to transfer it to another server (Second Life has a huge problem with this). The issue can be mitigated by preparing the transfer in advance when the player gets close to a virtual border. So that most of the player data is already cached on the destination server when the actual handover happens.

Summary

Divide the problem by defining different services that can be split across servers with little dependences. As next step look at how to do load balance within the critical services. Delegate balancing work to the client by instructing it to connect directly to the relevant servers (obviously the servers have to check permissions). Keep it as simple as possible, document the responsibilities of the various services and servers well, provide the option to enable debug output.

PS: Some of these techniques can be used to improve reliability. And you should keep that in mind because using many servers implies a much higher risk of things breaking; not only in the software but also at the hardware level.

Generally the world is divided up into a number of smaller regions. Each of these regions is usually an independent server process (WoW's world servers or Eve's Sol nodes) and can run on any of a number of machines. In some games there are explicit doors between maps (Eve, STO, Guild Wars) while others try to mask this more (WAR, Free Realms). Those that opt for the more seamless approach generally will detect when you are nearing the border between two servers and the two process negotiate a handoff. The best place to probably look for a description of this is in how cell towers do handoffs of moving handsets. If the load of a single map (Jita, Ironforge, Earth Space Dock) gets really big, you can sometimes offload individual functions to other servers (AI, certain parts of player management) but this either has to be built-in from the start or would take some serious retrofitting. It is almost always more cost effective to just buy better hardware to dedicate to those few maps.

I believe it's a common requirement of
MMOs that processing for a single
shard or realm can be done over
several servers to ease the load. I'm
curious as to how this can be done
whilst maintaining a unified
consistent world where all the
players, and all the NPCs can
interact.

It's probably not as common as you think; at least, not if you're thinking that one seamless world is managed by several servers simultaneously.

Not counting totally separate shards, there are 2 directions in which you can split up an online game, which could be considered "horizontal" and "vertical":

Divide the game up into many separate geographical areas. All the functionality for any given geographical area is handled by one server and there is no real interaction between them. (Note that there isn't necessarily just 1 zone per server - a server might handle several zones simultaneously, and zones can perhaps be transferred between servers to handle changing load.)

Divide the game up into several types of service - eg. login/authorisation, gameplay rules and physics, chat+auctions, persistence, etc. Each of these services can be handled by a different server. nhnb's answer has enumerated other potential services that a developer can partition their game into.

Obviously these approaches are orthogonal and you can combine the two. In fact it's almost mandatory to have a separate database server, very common to push off login/auth onto a separate machine from the gameplay, and increasingly common to farm off chat and other non-critical communications as well, no matter how your game world is divided up.

But on the whole, when there is geographical partitioning, most games avoid letting you interact across those boundaries, because it's difficult to do well. Instead, they resort to other ways of making it seem like you're all still in the same shard and on the same server, when actually you're not. eg.
- loading screens or other animations that cover up a server change when transitioning between zones, or from one continent to another.
- separate dungeon or raid instances that are isolated from everybody else. These are like a shard within a shard and can easily be run on a separate server, helping the load balancing.

I can't speak with authority on WoW but I would guess they're doing almost all of the above: instancing, separate geographical areas that can't interact joined by portals of some sort, separate back end and auth servers. I've heard that WoW realms have something between 1000 and 10000 players online in a given realm at once, which is easily manageable with the above schemes.

But, let's assume you have a single massive world and that you do need to allow players one one server to interact with players on an adjacent server. This is easy to do in theory - first, servers must cooperate to share details of objects along the borders (so an object on one server may have a proxy representation on another), and then simply change all your logic to message-passing, with messages being routed from a proxy back to the authoritative source where necessary. Messages can be passed between servers or within a server fairly transparently so one approach fits all systems.

The problem here is that previously simple logic can become very complex when translated to messages - eg. a 2 player trade which can happen safely and atomically when both players are on one server becomes a longer process when messages have to be sent back and forth, reverified on each send, and safeguards put in place to ensure that one player can't exploit the other by changing the trade while a message is travelling. You can't even assume the other player will still exist by the time the message arrives (as they might die, log out, etc), so the code becomes very complex. And this will apply to almost any system where 2 or more entities can interact or cooperate - trade, combat, grouping, auctions, loot-sharing, training, etc.

These problems aren't insurmountable but for most games they're too difficult to be worth attempting when you can share the load via the other means and keep all your game logic on one server. So almost all current games go down that route instead.

There is many methods of load balancing an MMO server, since there is quite a wide range of data to be processed. I prefer the process bin tree method.

A global server passes user connections to a process bin that can handle several users at once. the process bins do all the complex processing and only respond to the global server with data that is globally relevant such as global chat and positioning. This method balances much better than region servers, since regions can vary greatly in population, while overall user processing is varied enough that it should naturally balance itself for the most part.

Just do some basic load balancing via the global server so when a process bin reaches a certain memory/cpu usage, you startup a new process bin server.

How do you share data shared between process bins, for example a fight between two users on different process bins. How do you ensure the order of events? So that a killed player cannot do any any attack anymore, even if the bin that kills him is slower than the bin doing the attack. Is there a risk of the dispatching overhead getting to high on the global server? The proxy model for the user connections may get into operating system limits on the network stack.
–
Hendrik BrummermannNov 30 '10 at 16:47

This model works pretty well in information systems where most transactions are isolated. I mean they usually don't work on the same data, and in the rare cases locking or rollback is used. But in games where fights include multiple players and/or creatures, and the impact of attacks is influenced by attributes of both the attacker and the defender, this approach may be difficult.
–
Hendrik BrummermannNov 30 '10 at 17:14

You can either go the easy way and consider multi-user interactions to be globally relevant, or you can create a simple method for process bins to be aware of each other and communicate. Each process bin should be able to handle about ten thousand users at once, so state communication between users shouldn't be too much of an issue. The region method is a bit easier, but not as balanced, and can easily be crashed if too many users enter the same region. In an MMORPG with a large user-base balancing the load evenly is very important.
–
Stephen BelangerNov 30 '10 at 17:25

I'd love to know "a simple method" for 2 process bins to be able to negotiate a system like combat. The problem isn't the low level comms but that typical gameplay algorithms get very complicated with distributed participants.
–
KylotanNov 30 '10 at 21:07

In my implementation, my global server maintains a list of all connected clients and keeps track of what process bin they are connected to. If a process bin needs to access another user it first checks it's own user list. If that fails, it checks the global list and identifies what process bin the other user is connect to. The process bins then connect directly to share user states while it does the unified processing.
–
Stephen BelangerNov 30 '10 at 21:44