Classical Game Deployment Architecture

Fig VI.4 shows a classical game deployment diagram.

In this deployment architecture, clients are connected to Game Servers directly, and Game Servers are connected to a single DB Server, which hosts system-wide persistent state. Each of Game Servers MIGHT (or might not) have its own database (or other persistent storage) depending on the needs of your specific game; however, usually Game Servers store only in-memory states with all the persistent storage going into a single DB residing on DB Server.

Game Servers

Game Servers are traditionally divided according to their functionality, and while you can combine different types of functionality on the same box, there are often good reasons to avoid combining too many different things together.

Different types of Game Servers (more strictly – different types of functionality hosted on Game Servers) should be mapped to the entities on your Entities&Relationships Diagram described in Chapter II. You should do this mapping for your specific game yourself. However, as an example, let’s take a look at a few of typical Game Servers (while as always, YMMV, these are likely to be present for quite a few games):

Game World Servers. Your game worlds are running on Game World Servers, plain and simple. Note that “Game World” here doesn’t necessarily mean a “3D game world with simulated physics etc.”. Taking a page from a casino-like games book, “Game World” can be a casino table; going even further into realm of stock exchanges, “Game World” may be a stock exchange floor. Surprisingly, from an architecture point of view, all these seemingly different things are very similar. All of them represent a certain state (we usually name it “game world”) which is affected by player’s actions in real time, and changes to this state are shown to all the players.1

“Usually, when a player launches her client app, the client by default connects to one of Matchmaking Servers.Matchmaking Servers. Usually, when a player launches her client app, the client by default connects to one of Matchmaking Servers. In general, matchmaking servers are responsible for redirecting players to one of your multiple game worlds. In practice, they can be pretty much anything: from lobbies where players can join teams or select game worlds, to completely automated matchmaking. Usually it is matchmaking servers that are responsible for creating new game worlds, and placing them on the servers (and sometimes even creating new servers in cloud environments).

Tournament Servers. Not always, but quite often your game will include certain types of “tournaments”, which can be defined as game-related entities that have their own life span and may create multiple Game World instances during this life span. Technically, these are usually reminiscent of Matchmaking Servers (they need to communicate with players, they need to create Game Worlds, they tend to use about the same generic protocol synchronization mechanics, see Chapter [[TODO]] for details), but of course, Tournament Servers need to implement tournament rules of the specific tournament etc. etc.

Payment Server and Social Gateway Server. These are necessary to provide interaction of your game with the real world. While these server might look an “optional thing nobody should care about”, they’re usually playing an all-important role in increasing popularity of your game and monetization, so you’d better to account for them from the very beginning.

“Payment Server and Social Gateway Server are necessary to provide interaction of your game with the real world.The very nature of Payment Servers and Social Gateway Server is to be “gateways to the real world”, so they’re usually exactly what is written on the tin: gateways. It means that their primary function is usually to get some kind of input from the player and/or other Game Servers, write something to DB (via DB Server), and make some request according to some-external-protocol (defined by payment provider or by social network). On the other hand, implementing them when you need to support multiple payment/social providers (each with their own peculiarities, you can count on it) – is a challenge; also they tend to change a lot due to requirements coming from business and marketing, changes in provider’s APIs, need to support new providers etc. And of course, at least for payment servers, there are questions of distributed transactions between your DB and payment-provider DB, with all the associated issues of recovery from “unknown-state” transactions, and semi-manul reconciliation of reports at the end of month. As a result, these two seemingly irrelevant-to-gameplay servers tend to have their own teams after deployment; more details on payment servers will be discussed in Chapter [[TODO]].

One of the things these servers should do, is isolating Game World Servers and preferably Matchmaking Servers from the intimate details about specifics of the payment providers and social networks. In other words, Game World Servers shouldn’t generally know about such things as “a guy has made a post of Facebook, so we need to give him bonus of 25% extra experience for 2 days”. Instead, this functionality should be split in two: Social Gateway Server should say “this guy has earned bonus X” (with explanation in DB why he’s got the bonus, for audit purposes), and Game World Server should take “this guy has bonus X” statement and translate it into 25% extra experience.

1 restrictions may apply to which parts of the state are shown to which players. One such example is a server-side fog-of-war, that we’ll discuss in Chapter [[TODO]]

Implementing Game Servers under QnFSM architecture

In theory, Game Servers can be implemented in whatever way you prefer. In practice, however, I strongly suggest to have them implemented under Queues-and-FSMs (QnFSM) model described in Chapter V. Among the other things, QnFSM provides very clean separation between different modules, enables replay-based debug and production post-mortem, allows for different deployment scenarios without changing the FSM code (this one becomes quite important for the server side), and completely avoids all those pesky inter-thread synchronization problems at logical level; see Chapter V for further discussion of QnFSM benefits.

Fig VI.5 shows a diagram with an implementation of a generic Game Server under QnFSM:

If it looks complicated at the first glance – well, it should. First of all, the diagram represents quite a generic case, and for your specific game (and at least at first stages) you may not need all of that stuff, we’ll discuss it below. Second, but certainly not unimportant, writing anywhere-close-to-scalable server is not easy.

Now let’s take a closer look at the diagram on Fig VI.5, going in an unusual direction from right to left.

“When a Matchmaking server needs to create a new game world on server X, it sends a request to the Game Logic Factory which resides on server X, and Game Logic Factory creates game world with requested parameters.Game Logic and Game Logic Factory. On the rightmost side of the diagram, there is the most interesting part – things, closely related to your game logic. Specifics of those Game Logic FSMs are different for different Game Servers you have, and can vary from “Game World FSM” to “Payment Processing FSM” with anything else you need in between. It is worth noting that while for most Game Logic FSMs you won’t need any communications with the outside world except for sending/receiving messages (as shown on the diagram), for gateway-style FSMs (such as Payment FSM or Social Gateway FSM) you will need some kind of external API (most of the time they go over outgoing HTTP, though I’ve seen quite strange things, such as X.25); it doesn’t change the nature of those gateway-style FSMs, so you still have all the FSM goodies (as long as you “intercept” all the calls to that external API, see Chapter V for details). [[TODO! – discussion on blocking-vs-non-blocking APIs for gateway-style FSMs]]

Game Logic Factory is necessary to create new FSMs (and if necessary, new threads) by an external request. For example, when a Matchmaking server needs to create a new game world on server X, it sends a request to the Game Logic Factory which resides on server X, and Game Logic Factory creates game world with requested parameters. Deployment-wise, usually there is only one instance of the Game Logic Factory per server, but technically there is no such strict requirement.

TCP Sockets and TCP Accept. Going to the left of Game Logic on Fig VI.5, we can see TCP-related stuff. Here the things are relatively simple: we have classical accept() thread, that passes the accepted sockets to Socket Threads (creating Socket Threads when it becomes necessary).

The only really important thing to be noted here is that each Socket Thread2 should normally handle more than one TCP socket; usually number of TCP sockets per thread for a game server should be somewhere between 16 and 128 (or “somewhere between 10 and 100” if you prefer decimal notation to hex). On Windows, if you’re using WaitForMultipleObjects()3, you’re likely to hit the wall at around 30 sockets per thread (see further discussion in Chapter [[TODO]]), and this has been observed to work perfectly fine. Having one thread (even worse – two, one for recv() and another one for send()) per socket on the server-side is generally not advisable, as threads have substantial associated overhead (both in terms of resources, and in terms of context switches). In theory, multiple sockets per thread may cause additional latencies and jitter, but in practice for a reasonably well written code running on a non-overloaded server I wouldn’t expect additional latencies and jitter of more than single-digit microseconds, which should be non-observable even for the most fast-paced games.

3 which IMHO provides the best balance between performance and implementation complexity (that is, if you need to run your servers on Windows), see Chapter [[TODO]] for further details

UDP-related FSMs. UDP (shown on the left side of Fig VI.5) is quite a weird beast; in some cases, you can use really simple things to get UDP working, but in some other cases (especially when high performance is involved), you may need to resort to quite heavy solutions to achieve scalability. The solution on Fig VI.5 is on the simpler side, so you MIGHT need to get into more complicated things to achieve performance/scalability (see below).

Let’s start explaining things here. One problem which you [almost?] universally will have when using UDP, is that you will need to know whether your player is connected or not. And as soon as you have a concept of “UDP connection” (for example, provided by your “reliable UDP” library), you have some kind of connection state/context that needs to be stored somewhere. This is where those “Connected UDP Threads” come in.

KISS principleKISS is an acronym for 'Keep it simple, stupid' as a design principle noted by the U.S. Navy in 1960.— Wikipedia —So, as soon as we have the concept of “player connected to our server” (and we need this concept at least because players need to be subscribed to the updates from our server), we need those “Connected UDP Threads”. Not exactly the best start from KISS point of view, but at least we know what we need them for. As for the number of those threads – we should limit the number of UDP connections per Connected UDP Thread; as a starting point, we can use the same ballpark numbers of UDP connections per thread as we were using for TCP sockets per thread: that is, between 16-128 UDP connections per thread.

UDP Handler Thread and FSM is a very simple thing – it merely gets whatever-comes-in-from-recvfrom(), and passes it to an appropriate Connected UDP Thread (as UDP Handler FSM also creates those Connected UDP Threads, it is not a problem for it to have a map of incoming-packet-IP/port-pairs to threads).

“You MAY find that your UDP Handler Thread becomes a bottleneck, causing incoming packets to dropHowever, you MAY find that this simpler approach doesn’t work for you (and your UDP Handler Thread becomes a bottleneck, causing incoming packets to drop while your server is not overloaded yet); in this case, you’ll need to use platform-specific stuff such as recvmmsg(),4 or to use multiple recvfrom()/sendto() threads. The latter multi-threaded approach will in turn cause a question “where to store this mapping of incoming-packet-IP/port-pairs to threads”. This can be addressed either using shared state (which is a deviation from pure FSM model, but in this particular case it won’t cause too much trouble in practice), or via separate UDP Factory Thread/FSM (with UDP Factory FSM storing the mapping, and notifying recvfrom() threads about the mapping on request, in a manner somewhat similar to the one used for Routing Factory FSM described in [[TODO]] section below).

4 see further discussion on recvmmsg() in Chapter [[TODO]]

Websocket-related FSMs and HTTP-related FSMs (not shown). If you need to support Websocket clients (or, Stevens forbid, HTTP clients) in addition to, or instead of TCP or UDP, this can be implemented quite easily. Basic Websocket protocol is very simple (with basic HTTP being even simpler), so you can use pretty much the same FSMs as for TCP, but implementing additional header parsing and frame logic within your Websocket FSMs. If you think you need to support HTTP protocol for a synchronous game – think again, as implementing interactive communications over request-response HTTP is difficult (and tends to cause too much server load), so Websockets are generally preferable over HTTP for synchronous games and are providing about-the-same (though not identical) benefits in terms of browser support and being firewall friendly; see further discussion on these protocols in Chapter [[TODO]]. For asynchronous games, HTTP (with simple polling) MAY be a reasonable choice.

CUDA/OpenCL/Phi FSM (not shown). If your Game Worlds require simulation which is very computationally heavy, you may want to use your Game World servers with CUDA (or OpenCL/Phi) hardware, and to add another FSM (not shown on Fig VI.5) to communicate with CUDA/OpenCL/Phi GPGPU. A few things to note in this regard:

We won’t discuss how to apply CUDA/OpenCL/Phi to your simulation; this is your game and a question “how to use massively parallel computations for your specific simulation” is utterly out of scope of the present book.

Obtaining strict determinism for CUDA/OpenCL FSMs is not trivial due to potential inter-thread interactions which may, for example, change the order of floating-point additions which may lead to rounding-related differences in the last digit (with both results practically the same, but technically different). However, for most of gaming purposes (except for replaying server-side simulation forever-and-ever on all the clients), even this “almost-strict-determinism” may be sufficient. For example, for “recovery via replay” feature discussed in “Complete Recovery from Game World server failures: DIY Fault-Tolerance in QnFSM World” section below, results during replay-since-last-state-snapshot, while not guaranteed to be exactly the same, are not too likely to result in macroscopic changes which are too visible to players.

Normally, you’re not going to ship your game servers to your datacenter. Well, if the life of your game depends on it, you might, but this is a huuuge headache (see below, as well as Chapter [[TODO]] for further discussion)

CSPCloud Service Provider As soon as you agree that it is not your servers, but leased ones or cloud ones (see also Chapter [[TODO]]), it means that you’re completely dependent on your server ISP/CSP on supporting whatever you need.

Most likely, with 3rd-party ISP/CSP it will be Tesla or GRID GPU (both by NVidia), so in this case you should be ok with CUDA rather than OpenCL.

The choice of such ISPs which can lease you GPUs, is limited, and they tend to be on an expensive side :-(. As of the end of 2015, the best I was able to find was Tesla K80 GPU (the one with 4992 cores) rented at $500/month (up to two K80’s per server, with the server itself going at $750/month). With cloud-based GPUs, things weren’t any better, and started from around $350/month for a GRID K340 (the one with 4×384=1536 total cores). Ouch!

“In short – Titan X gets you more or less comparable performance parameters (except for RAM size and double-precision calculations) at less than 30% of the price of Tesla K80.If you are going to co-locate your servers instead of leasing them from ISP5, you should still realize that server-oriented NVidia Tesla GPUs (as well as AMD FirePro S designated for servers) are damn expensive. For example, as of the end of 2015, Tesla K80 costs around $4000(!); at this price, you get 2xGK210 cores, 24GB RAM@5GHz, clock of 562/875MHz, and 4992 CUDA cores. At the same time, desktop-class GeForce Titan X is available for about $1100, has 2 of newer GM200 cores, 12GB RAM@7GHz, clock of 1002/1089MHz, and 3072 CUDA cores. In short – Titan X gets you more or less comparable performance parameters (except for RAM size and double-precision calculations) at less than 30% of the price of Tesla K80. It might look as a no-brainer to use desktop-class GPUs, but there are several significant things to keep in mind:

the numbers above are not directly comparable; make sure to test your specific simulation with different cards before making a decision. In particular, differences due to RAM size a double-precision maths can be very nasty depending on specifics of your code

even if you’re assembling your servers yourself, you are still going to place your servers into a 3rd-party datacenter; hosting stuff within your office is not an option (see Chapter [[TODO]])

space in datacenters costs, and costs a lot. It means that tower servers, even if allowed, are damn expensive. In turn, it usually means that you need a “rack” server.

Usually, you cannot just push a desktop-class GPU card (especially a card such as Titan X) into your usual 1U/2U “rack” server; even if it fits physically, in most cases it won’t be able to run properly because of overheating. Feel free to try, and maybe you will find the card which runs ok, but don’t expect it to be the-latest-greatest one; thermal conditions within “rack” servers are extremely tight, and air flows are traditionally very different from the desktop servers, so throwing in additional 250W or so with a desktop-oriented air flow to a non-GPU-optimized server isn’t likely to work for more than a few minutes.

IMHO, your best bet would be to buy rack servers which are specially designated as “GPU-optimized”, and ideally – explicitly supporting those GPUs that you’re going to use. Examples of rack-servers-supporting-desktop-class-GPUs range from6 1U server by Supermicro with up 4x Titan X cards,7 to 4U boxes with up to 8x Titan X cards, and monsters such as 12U multi-node “cluster” which includes total of 10×6-core Xeons and 16x GTX 980, the whole thing going at humble $40K total, by ExxactCorp. In any case, before investing a lot to buy dozens of specific servers, make sure to load-test them, and load-test a lot to make sure that they won’t overheat under many hours of heavy load and datacenter-class thermal conditions (where you have 42 such 1U servers with one lying right on top of each other, ouch!, see Chapter [[TODO]] for further details).

“If your game cannot survive without server-side GPGPU simulations – it can be done, but be prepared to pay a lot more than you would expect based on desktop GPU pricesTo summarize: if your game cannot survive without server-side GPGPU simulations – it can be done, but be prepared to pay a lot more than you would expect based on desktop GPU prices, and keep in mind that deploying CUDA/OpenCL/Phi on servers will take much more effort than simply making your software run on your local Titan X 🙁 . Also – make sure to start testing on real server rack-based hardware as early as possible, you do need to know ASAP whether hardware of your choice has any pitfalls.

5 this potentially includes even assembling them yourself, but I generally don’t recommend it

6 I didn’t use any of these, so I cannot really vouch for them, but at least you, IMHO, have reasonably good chances if you try; also make sure to double-check if your colocation provider is ready to host these not-so-mainstream boxes

7 officially Supermicro doesn’t support Titans, but their 1U boxes can be bought from 3rd-party VARs such as Thinkmate with 4x Titan X for a total of $10K, Titans included; whether it really works with Titans in datacenter environment 24×7 under your type of load – you’ll need to see yourself

Simplifications. Of course, if your server doesn’t need to support UDP, you won’t need corresponding threads and FSMs. However, keep in mind that usually your connection to DB Server SHOULD be TCP (see “On Inter-Server Communications” section below), so if your client-to-server communication is UDP, you’ll usually need to implement both. On the other hand, our QnFSM architecture provides a very good separation between protocols and logic, so usually you can safely start with a TCP-only server, and this will almost-certainly be enough to test your game intra-LAN (where packet losses and latencies are negligible), and implement UDP support later (without the need to change your FSMs). Appropriate APIs which allow this kind of clean separation, will be discussed in Chapter [[TODO]].

On Inter-Server Communications

One of the questions you will face when designing your server-side, will be about the protocol used for inter-server communications. My take on it is simple:

even if you’re using UDP for client-to-server communications, seriously consider using TCP for server-to-server communications

Detailed discussion on TCP (lack of) interactivity is due in Chapter [[TODO]], but for now, let’s just say that poor interactivity of TCP (when you have Nagle algorithm disabled) becomes observable only when you have packet loss, and if you have non-zero packet loss within your server LAN – you need to fire your admins.8

On the positive side, TCP has two significant benefits. First, if you can get acceptable latencies without disabling Nagle algorithm, TCP is likely to produce much less hardware interrupts (and overall context switches) on the receiving server’s side, which in turn is likely to reduce overall load of your Game Servers and even more importantly – DB Server. Second, TCP is usually much easier to deal with than UDP (on the other hand, this may be offset if you already have implemented UDP support to handle client-to-server communications).

8 to those asking “if it is zero packet loss, why would we need to use TCP at all?” – I’ll note that when I’m speaking about “zero packet loss”, I can’t rule out two packet lost in a day which can happen even if your system is really really well-built. And while a-few-dozen-microsecond additional delay twice a day won’t be noticeable, crashing twice a day is not too good

When it comes to the available deployment options, QnFSM is an extremely flexible architecture. Let’s discuss your deployment and run-time options provided by QnFSM in more detail.

Threads and Processes

“FSMs can be deployed as multiple-FSMs-per-thread, one-FSM-per-thread-multiple-threads-per-process, or one-FSM-per-process configurations (all this without changing your FSM code at all)First of all, you can have your FSMs deployed in different configurations depending on your needs. In particular, FSMs can be deployed as multiple-FSMs-per-thread, one-FSM-per-thread-multiple-threads-per-process, or one-FSM-per-process configurations (all this without changing your FSM code at all).9

In one real-world system with hundreds of thousands simultaneous players but lightweight processing on the server-side and rather high acceptable latencies, they’ve decided to have some of game worlds (those for novice players) deployed as multiple-FSMs-per-thread, another bunch of game worlds (intended for mature players) – deployed as a single-FSM-per-thread (improving latencies a bit, and providing an option to raise thread priority for these FSMs), and those game worlds for pro players – as a single-FSM-per-process (additionally improving memory isolation in case of problems, and practically-unobservedly improving memory locality and therefore performance); all these FSMs were using absolutely very same FSM code, but it was compiled into different executables to provide slightly different performance properties.

Moreover, in really extreme cases (like “we’re running a Tournament of the Year with live players”), you may even pin a single-FSM-per-thread to a single core (preferably the same where interrupts from you NIC come on this server) and to pin other processes to other cores, keeping your latencies to the absolute minimum.10

9 Restrictions apply, batteries not included. If you have blocking calls from within your FSM, which is common for DB-style FSMs and some of gateway-style FSMs, you shouldn’t deploy multiple-FSMs-per-thread

10 yes, this will further reduce latencies in addition to any benefits obtained by simple increase of thread priority, because of per-core caches being intact

Communication as an Implementation Detail

With QnFSM, communication becomes an implementation detail. For example, you can have the same Game Logic FSM to serve both TCP and UDP. Not only it can come handy for testing purposes, but also may enable some of your players (those who cannot access your servers via UDP due to firewalls/weird routers etc.) to play over TCP, while the rest are playing over UDP. Whether you want this capability (and whether you want to match TCP players only with TCP players to make sure nobody has an unfair advantage) is up to you, but at least QnFSM does provide you with such an option at a very limited cost.

Moving Game Worlds Around (at the cost of client reconnect)

Yet another flexibility option which QnFSM can provide (though with some additional headache, and a bit of additional latencies), is to allow moving your game worlds (or more generally – FSMs) from one server to another one. To do it, you just need to serialize your FSM on server A (see Chapter V for details on serialization), to transfer serialized state to a server’s B Game Logic Factory, and to deserialize it there. Bingo! Your FSM runs on server B right from the same moment where it stopped running on server A. In practice, however, moving FSMs around is not that easy, as you’ll also need to notify your clients about changed address where this moved FSM can be reached, but despite being an additional chunk of work, this is also perfectly doable if you really want it.

Online Upgrades

“Yet another two options provided by QnFSM, enable server-side software upgrades without stopping the server.Yet another two options provided by QnFSM, enable server-side software upgrades while your system is running, without stopping the server.

The first of these options is just to start creating new game worlds using new Game Logic FSMs (while existing FSMs are still running with the old code). This works as long as changes within FSMs are minor enough so that all external inter-FSM interfaces are 100% backward compatible, and the life time of each FSM is naturally limited (so that at some point you’re able to say that migration from the old code is complete).

The second of these online-upgrade options allows to upgrade FSMs while the game world is still running (via serialization – replacing the code – deserialization). This second option, however, is much more demanding than the first one, and migration problems may be difficult to identify. Therefore, severe automated testing using “replay” technique (also provided by QnFSM, see Chapter V for details) is strongly advised. Such testing should use big chunks of the real-world data, and should simulate online upgrades at the random moments of the replay.

On Importance of Flexibility

Quite often we don’t realize how important flexibility is. Actually, we rarely realize how important it is until we run into the wall because of lack of flexibility. Deterministic FSMs provide a lot of flexibility (as well as other goodies such as post-mortem) at a relatively low development cost. That’s one of the reasons why I am positively in love with them.

Comments

You have mentioned the TCP and UDP protocols, however websockets weren’t described. I think comparing websockets with tcp would be useful. Also, I hope there will be some chapter with frameworks description where you could mention actor-based architechture like Akka.

THANKS! I’ve added Websockets above (they can be handled pretty much like TCP), and mentioned Akka’s Actors in Chapter V(d) (right near Erlang, they’re actually very similar to each other and to QnFSM).

I have an FSM-related question. May be it’s again a bit too “techy” and will be discussed in vol.II only, but I’d like to ask anyway if you don’t mind.

On your diagram, it’s obvious that network-related FSMs are using “wait for event” triggering. Whether it’s good old select() or something like WSAWaitForMultipleEvents() – doesn’t really matter as it’s implementation details. At the same time, I’d like to ask about your thoughts on scheduling strategy of logic FSMs.

Basically, I know two usual approaches there – “wait for event” and “timed polls”.
* First one is basically the same as in the network FSM, with inbox queue having an event object. Again, whether it’s std::condition_variable::wait() or something like WaitForSingleEvent() – implementation details;
* Second approach can be expressed with a tight-loop including std::this_thread::sleep_for() and something like while (queue::pop_event()…);

While first one looks conceptually better, I still think second one has its own merits, especially in the cases when there is no “true real-time” constraints on event processing. Basically, my observations that it’s sometimes better to “accumulate” such events in inbox for, say, 100ms (or 500ms) and then process all of those in one batch, effectively decreasing the amount of active concurrent threads and reducing the contention. What I saw is that such approach helps with contention in case of “trivial” event handlers (i.e. when the amount of time needed for each event processing is negligible comparing to OS tick, which I suspect is true for a lot of MMO logic processing).

Of course, I suspect that such “scheduled poll” approach might not work that nice in MMO architectures with accepted poll period around 10ms-ish (*wildguess* poll value). I don’t think you can reliably make it smaller on usual OSes, definitely not for Windows, not totally sure about Linuxes.

All in all, I’d love to hear your experienced thoughts on this matter. Of course, if it’s something from much later part of the book, I totally don’t want you to distract from your plan 🙂

PS: I’m asking because I don’t have any experience with MMO realms, but I worked on distributed calculations (think “Big Data Crunching” and multi-thread/multi-server simulation of physical processes). And, based on what I saw in your “64 Do’s/Dont’s” articles and this book, the back-end part of the work, i.e. “world simulation”, is actually pretty similar. Although, I never had any problems with those pesky “cheaters” 🙂

So, I’m curious to see the differences in architectural decisions due to different business requirements.

> Basically, my observations that it’s sometimes better to “accumulate” such events in inbox for, say, 100ms (or 500ms) and then process all of those in one batch, effectively decreasing the amount of active concurrent threads and reducing the contention.

Wait, which contention you’re speaking about? If you have a massive shared state protected by mutex – then there would be contention (on this mutex) and reducing number of threads would be a good thing (though it is better to be done in a different manner). With FSMs/Actors, however, it is shared-nothing, so there is nothing to compete for, no mutex, and no contention.

Overall, as a Really Big Fat rule of thumb: stay away from tight loops and polling on the server side. While on the client-side they’re just nuisances (though I’m avoiding them on the clients too), on the server-side they SHOULD be avoided at all costs (well, there are exceptions, but they’re more of exotic nature, like “it may be ok to use polling when you’re shutting down your daemon”).

The reason behind is trivial: it is damn too expensive – OR it brings too much latencies. Each time when you wake up your thread (only to find that nothing has arrived), you’re getting a context switch, and that’s spending like 10000 CPU clocks (EDIT: more like 100K-1M, see, for instance, http://www.cs.rochester.edu/u/cli/research/switch.pdf ). Way Too Expensive (especially when you find out that you did it for no reason). In addition, it puts you into a kind of predicament – reducing poll interval is bad because of the context switches, and increasing it hits game responsiveness.

One additional interesting thing about these select()/WaitFor*() functions: with them in use, as load on the system grows (and unlike data crunching, games do not operate under 100% load, so there should be reserve at all times), “batching” of multiple requests will start to occur naturally, reducing number of context switches as it is needed. In other words, select()-based system will automagically adapt to higher loads, increasing latencies to the extent which is necessary to handle current load. It is graceful degradation in action.

Overall, there is a Good Reason for all those WaitFor*() and select() functions (and there is a consensus against tight loops) – and this is avoiding context switches (and context switches can kill server performance instantly, been there, seen that).

> With FSMs/Actors, however, it is shared-nothing, so there is nothing to compete for, no mutex, and no contention.

Yes, except the queue itself. That can be implemented in lock-free approach, is that you mean? Without lock-free techniques, the queue is a shared resource, so there is some concurrency. And with multiple-writers/single-reader, as I understand, you still need some mutex-like or spinlock-like technique. The simple pure lock-free rung-buffer for single-producer/single-consumer doesn’t work here.

> …OR it brings too much latencies

I think that’s the main difference between real-time MMOs and something like data processing/simulation. In the second case, it’s sometimes OK to sync not very often (i.e. once in a second, for example). And the amount of data passing through queue is often non-trivial too (which also differs from MMO).

OK. Thanks for providing these insights! I think now I better understand these differences and context of MMO.

Please disregard my last comment. I just suddenly figured out that I can take any queue with any number of events and any amount of data from “shared” queue into private FSM queue with just single swap of pimpl’s. Looks like this idea is 7 years late, but it makes the process of “taking current queue” just a trivial task with a single mutex locked for 2 pointer assignments (or some other kind of lightweight sync).

What you’re suggesting, would probably work for Single-Writer-Single-Reader Queue, but IIRC, for Multiple-Writer-Single-Reader queues (and that’s what we generally need for FSMs) it is not as simple as two-pointers swap. However: (a) even if using mutex, it is still small (and the smaller the code under the lock – the less contention you have); (b) it can be implemented in a completely lockless manner, based on a circular buffer, plus CAS primitive (a.k.a. LOCK XCHG for x86 a.k.a. std::atomic_compare_exchange for C++ a.k.a. InterlockedExchange() for Windows). Implementing (b) properly is a Big Headache, but it needs to be done only once, and it has been done for example in boost::lockfree::queue (though in practice, you’ll additionally need some kind of waitForPop() function, which doesn’t seem to be provided by boost::lockfree::queue 🙁 ).

Perhaps you could elaborate a little on how to scale simulation of a large world – both in terms of using several single-threaded FSM on a single server and distributing the world on several servers.
In particular – how to handle scaling of large contiguous zones if a single FSM – or even a single server -won’t cut it. I suppose the “share nothing” rule must be worked around in this case?

Good question. Yes, I didn’t answer it in “beta” chapters, but I will include it into “final” version of the book (Chapter III, protocols). Very shortly – the typical way of doing it is to split your game world into “zones” (with zones often having an overlap to account for objects moving near the border). It was described in “Seamless Servers: the case for and against” by Jason Beardsley (which is a part of “Massively Multiplayer Game Development” book published in 2003) and is still in use (it was recently mentioned, for example, in WoT presentation on GDC2016 (which should be on GDC Vault soon)).

DB FSMs I’ve seen, were essentially stateless (except for app-level caching as their state – usually cache is read-only, but write caches are also possible).

One simple example: game world sends a request to DB FSM asking to move artefact X from player Y to player Z (as artefact X was lost during fight, whatever-else). On DB FSM side, most of the checks (like “whether player Y has artefact X”, etc. etc.) can be done from the read-only app-level cache, but transaction itself can be committed to DB (or can be write-cached, if the artefact is not THAT important one, or transaction commit can be postponed for a few milliseconds to save on commits – and reply back to game world can be delayed until transaction is committed, to make sure that ACID properties stand despite postponed commit, or…).

So, ‘DB FSM’ (at least as I’ve seen it) was pretty much a “thing which processes DB-related requests”, with its state usually being something along the lines above.

Hope it helps a bit (if not – feel free to ask further questions :-)). Also some discussion on DBs and DB FSMs is planned for upcoming Chapter XVII.