Background

I am working on improving the backend design for a video game. The game is live and working fine, but we want to improve various things. As a video game studio, we often do events which sometimes need tinkering with whilst the event is live - we would love to be able to do these changes seamlessly without having to force an update on players.

Problem

Our current backend uses various hardcoded values that our endpoints use. This means if we ever want to change an endpoint's behaviour we have to re-deploy. For instance, if we decide that a certain online-store item needs to point to something else we would need to change the id that is referenced on the endpoint. This is cumbersome and not ideal; we want a more dynamic way of changing configs.

The general idea we had is that we store these properties in a file on S3 / a CDN and the client and the backend fetch these periodically. There are issues with this approach when it comes to ensuring that both the backend and the client use the same configuration.

Edited solutions for clarity

Possible Solutions

Solution A)

Create an API that allows the source of the "truthiest" file to speak to the server. Push this new file to the server's cache whenever there is a change. Then, whenever the client talks to the server, check the client's file version/hash and compare it to what we have. If it matches, proceed, else we either 1) force the client to update, 2) tell the client to update manually (meaning we deny it access until then) or 3) have a version leeway that will give enough time for the client to update, in theory.

Pros

avoids roundway trips between server and file provider

Cons

not sure what solution is best for synchronising (1, 2 or 3)

more work required to build API etc...

Solution B)
Both the client and the server make calls periodically to see if there is a new config. When the client speaks to the server and the versions do not match, we have the 3 options again, only this time the server has to make calls to the storage (the truthiest source) to check which version is the correct one.

Pros

less work than Solution A

Cons

less reliable, too many points of weakness

too many round-trips between server-storage and client-storage.

Note on storage
There are problems with both S3 and a CDN. S3 isn't always honest - it can sometimes return older data - it is not a single source of truth. A CDN is useful as clients will have download this data for their specific language, but redistributing each new version can take time, which is why we may need a leeway.

This is a brief overview of the ideas we had. There are caveats that may or may not have been outlined but we are looking to find how this kind of thing is typically handled.

2 Answers
2

The general idea we had is that we store these properties in a file on S3 / a CDN and the client and the backend fetch these periodically.

No, this is the wrong design, due to the concerns that you cite.

You might have a hallway conversation about some client code "that manipulates the config file", that's fine. But what is crucial for your setup is the URI of the "the" config file must change each time you rev it. This induces cache misses and causes clients to be exactly in sync with server, as you desire. The cache hierarchy has many levels, some of which you can see or control, and inventing a new URI has the nice effect of causing cache misses at every level.

It's a simple matter of tacking on an ISO8601 timestamp and a serial number or hash to the config file's URI.

Depending on how frequently your config churns, you may want to age out ancient versions from your CDN.

Server will have to honor requests corresponding to K recent config files, where K is at least 2. Client requests should describe the config file in use by the client so server can verify it is a sensible request.

EDIT: You have recently added diagrams A & B. In (A) you propose client refreshing "same file" from servers. I am suggesting that intermediate "transparent" caches can confound that arrangement, and that is is appropriate to occasionally tell the client a new config file name so he will suffer cache miss and will retrieve from server. In (B) you have even less control over caching effects, so inventing a new config file name is even more important, to deal with the "older data from S3" issue that you mention. The synopsis is simply that you publish the content under each filename exactly once, and rev the filename when you rev the content. That way you can avoid heavier weight approaches like Paxos/ZK. You just need to push a new rev at time t0, wait N seconds until your servers have a copy of it, and then after t0 + N the servers are free to send the new name to clients so clients will use it. This offers consistency with very little effort. Alternatively, assume the load balancer hashes a given client to a single server, so within the brief interval it takes for all servers to get a fresh config, clients will only communicate with single server and therefore will trivially be in sync with that server.

First, the easy stuff. I presume you are using HTTP for your endpoints. There are standardized headers (last modified, ETag) for determining whether the client has the most current version of a resource and conditionally retrieving a resource only when it has been updated. I would recommend you not reinvent the wheel and use one of these existing mechanisms for determining when the config (a resource) has changed. If you wish to be able to return multiple versions of the config as suggested in J_H's answer, you can create a response that lists the versions available with URL's to those config resources. In this scheme, you would not modify the configs (in general) but instead add new ones as needed and simply update the directory. You can still use the headers to determine whether the directory has changed.

Now for the hard part. You should familiarize your self with the CAP Theorm. Here's the abridged version: Consistency, Availabilty, Partition Tolerance: you can have at most two. Much of what you are asking about here is related to that.

How do we handle outgoing requests in the moment we receive an update?

This query, I think, is at the heart of the question you need to answer. Let's say you have a client that retrieves the config a minute before you update it? What about a second? What about a millisecond? If you are OK with a client getting a resource a few nanos before it's updated, would it matter if they got the old version it while it was updated? I would guess no, and just return the version that was current at the time you started fulfilling the request.

You really have two options here, polling: continually checking the resource on a schedule, or events: pushing messages to a client to tell them there is an update. In my experience, the former is much much easier and more robust than the latter. And even when you push, there will be some lag that you really can't control between when you update the resource is updated and when they retrieve it. The upshot is that if your design cannot tolerate the client being behind the server for any amount of time, it's impossible to implement.

Ultimately, you need to determine how long you want to allow the client to be behind. Then take that time and halve it for your polling interval as a starting point. If that's too frequent, then you might want to look at something like websockets.

In terms of distributing the configurations to the servers and the clients via the same means, I think that will create some challenges around timing that you touch on. Instead, I would probably use a different process for distributing the configuration to the servers than I would use to distribute it to the client. I would expect that the servers need different information anyway but I don't have enough detail to know. One way you could tackle this is to distribute the config to all the servers via a distributed DB such as Zookeeper. You could have each server track which version of the config it has in the distributed db. Once they are all up to date you then distribute it to the CDN and let the clients retrieve it. This would ensure your clients never get ahead of the servers.

If you are really sure you want to distribute to clients and servers the same way, another approach would be to have the clients and servers negotiate the version of the config they will use. This would be similar to how Minecraft manages it's versions of the client software and what version(s) the server can support. Essentially, the server would present the versions it's capable of supporting and then the client would choose one in that set. In this case it sounds like you always want it to be the latest one they both support but it's not essential.

If we are fetching the resources from a CDN (or some other source) for both the server and the client, how should we handle the case where the client is more up-to-date than the server?
– turnipNov 28 '17 at 12:28

I'm a little unclear on why you would want to have the server depend on the CDN. Can you elaborate at bit? Would it be possible to add a simple diagram to your question?
– JimmyJamesNov 28 '17 at 15:01

I'm still unsure about what advantage you get from distributing this data to the servers via CDN. I can see why you want to do that with the clients but you have control over the servers, right? I'll add to my answer.
– JimmyJamesNov 28 '17 at 16:19

Simply put, given our current backend infrastructure we have no way of easily updating these hardcoded values that the backend uses on the fly, without having to redeploy the backend after manually updating it.
– turnipNov 28 '17 at 16:25