Pages

Sunday, February 17, 2013

Optimization Magic

One of the hardest things for a small team like ours to fit in to the product cycle on a regular basis is optimization.

Optimization is a little bit like making sure we are eating our vegetables every week. Every time we add or change a feature in the game, in theory we should also refactor everything around the code that just changed. Since that's usually not practical, large scale optimization really tends to happen in chunks when the team is finally given some time to focus on the task.

Now that we are about to release newly optimized code, I wanted to take some time to lift the hood on the work being done, and to share some insight into issues we have to contend with - from the software engine to server hardware.

We are really proud that APB Reloaded has remained a consistent top-5 (out of 100+) in Steam's Free2Play category since its December 2011 launch (as an aside; the game actually gets the vast majority of its traffic directly through http://www.gamersfirst.com/ rather than through Steam, but Steam provides a convenient benchmark to compare against other games), so we are planning for this game to provide many more years of entertainment for all its fans. But this tension between short term and long term goals (survive today, but plan for a 5+ year lifespan) means that every day we struggle with what we should focus on next; features, maps (Asylum anyone?), game content, security or optimization? All of these are pressing needs for one reason or another.

Given that turning off the game and entering an optimization-only cycle is not possible, we instead attempt the next best thing; we optimize while running full speed. It's a little bit like changing the oil in your car while travelling at 65 MPH down the highway. What could possibly go wrong?

APB - a server resource hog?

Few Unreal engine based games have attempted to throw 100 fully customized players (how many FPS games ever give you 50 v 50 in a single map?) in a twitch FPS/TPS setting on a single area, where each player can have a fully customized character, car and skinned weapon, which means at least 200 fully custom player items (cars and characters), up to 850 autonomous NPC pedestrians and 350 autonomous NPC cars driving around in a district, in addition to everything that's movable and destructible (traffic lights, dumpsters, billboards etc.).

This means roughly 18,000 dynamic actors for the server to track while running in a single shard, and in fact on a single server core(and read more on the 'single-core' issue down in the hardware section).

Granted newer games using other engines like Planetside 2/Forelight have used a very different 'continent', 'distance' and 'mission' optimization system to allow much larger factions on a continent (though not necessarily in the same firefight), and even Fallen Earth uses a system of dynamic shards to allow 10,000 players in an area. But neither is an Unreal game.

Technically speaking customization has some impact on the server side performance (mostly due to large amounts of asset streaming), but customization has a larger negative impact on the game client, and tends to drive client-side frame rate lower than the expected frame rate from someone's gaming rig if customizations had not been such a central part of the game.

While there are clearly other FPS/TPS' that perform amazing graphical feats on older hardware (CoD-MW series, Crysis series, Gears of War series, Far Cry etc.), they rarely allow this many complex humans and AI actors in a single battle area, or when they do, the participants are streamlined and unified and do not permit nearly APB's level of insane customization (or city-wide destruction). Or they behave more like RPG or RTS games and generally have much lower requirements for hit registration, server tick rate and movement prediction. The amazing feat in APB is that this game still actually works on a lot of pre-2009-era hardware given the extreme computational complexity of the game.

Server FPS vs Client FPS

As a general rule, we want the server to perform a full pass of computations for all the 100 players and 18,000+ district actors 30 times per second (giving each CPU core at most 33ms to complete all computations in that one frame). If we achieve 30 FPS on the server, then connected game clients can easily run at 2X-3X the server tick rate (60FPS - 90FPS) without any noticeable loss in accuracy. At 1:2 or 1:3 server-to-client ratio movement prediction and frame-interpolation provide a very smooth game experience.

Unfortunately during the last few updates we have had to temporarily lower the server tick rate to 25 FPS and reduce the max CCU per core, so it's high time to perform another full optimization pass.

Software Optimizations and Server Side Computation times.

Below is a graph of what version 1.10.1 server-side computations look like under ideal test circumstances AND using our new test hardware (more details on this new hardware at the bottom of this post).

In the current 1.10.1 build the server completes 1 full frame (moving those thousands of actors around) on 1 core in 1 full district using the new hardware type in 19.2ms. In 'theory' this means the server on the new hardware is capable of running at 52FPS tick rate (!).

This is to be compared with the 'current gen' hardware, where we have only been able to run a 'safe' server tick rate of 25FPS in the current 1.10.1 build.

The lower part of the graph shows version 1.10.2 with the new software optimizations.

From the synthetic test it appears the team has been able to squeeze a 16% performance improvement in software alone (which amounts to about a 10 FPS improvement on the server). This improvement drops the per-frame processing time to 16.1ms, which means a theoretical 62FPS server tick rate (again on the new hardware).

This 'should' mean that software optimization alone (the 16% improvement) will let us go from 25FPS back to the original 30FPS serverside tickrate on the current hardware as part of the 1.10.2 update (to be determined after the game is live).

You can read these graphs from the bottom up, starting with receiving network packets from all connected actors, updating game elements and physics, updating cameras and streaming, and ending with sending data back to all clients. What's rather surprising is that almost 50% of the entire server processing time consist entirely of receiving/parsing and serializing/sending network traffic. The actual game updates (players, objects, physics etc.) take only 50% of available CPU time.

From the above graphics you can see that the team has been able to really squeeze and optimize the "Receive Network Traffic" and the "Update Game Objects" steps. We expect to continue optimizing all the steps in the system, but presuming QA signs off on the upcoming patch, we will measure the real-world impact of these improvements in the coming week.

The Single-Core Engine Conundrum

First a disclaimer. The Unreal Engine has served us (and thousands of other games and companies) incredibly well. It's a great engine and a fantastic rendering system. Now the engine has certain design choices that create certain hard-to-overcome limits (as all engines do).

The biggest one for large scale games is Unreal's monolithic and (almost) single-threaded server-client-response system. The philosophy behind Epic making that design choice back in the era of Unreal Tournament / Gears of War makes perfect sense, given the engine's focus on small-scale lobby based FPS/TPS games or even single player or co-op games. Some Unreal based RPG's (for example Blade and Soul) have clearly adopted the engine as a renderer, and then created an entirely proprietary server system to handle RPG style updates and connection loads (which usually requires 2000-3000 players per shard but only a server tick-rate of 10FPS or less in RPG mode).

APB Reloaded uses a hybrid of standard Unreal server code (originally we used Unreal version 2008, so the engine is getting a little aged at this point) and its own proprietary TCP message stack coordinating the communications between worlds and districts, as well as a very proprietary customization system. But the general actor-to-actor interaction relies on a system that's very close to the original Unreal system. Mostly handled in a single game update loop.

This means all the processing in a single district happens on a single core and in a single thread.

One way to think of this is that the engine fundamentally works like a turn-based game where each actor has 33ms to move per turn. Within the scope of a single server core/thread, the process gives each actor one chance to make a move (or combination of moves). When all actors have signaled their move, everyone is told of everyone else's updated moves, and the game now proceeds to the next move (though from the chart above you can see that we actually only spend about 7.5ms moving stuff around, the rest of the time is spent sharing that information).

Human reaction time (or as it's called Mental Chronometry) is around 160ms, so processing everything at 33ms on the server, plus the packet roundtrip time (ideally less than 40-80ms) for a total of about sub 120ms of processing delay, should give us sufficient headroom to provide a good player experience.

However, even just a slight improvement in server side processing will actually enhance the fluidity of the game. We humans are very good at processing sequential frames of information and can easily spot the visual difference between film at 24fps and video at 30fps (or as the case may be "the Hobbit" at 48fps for those of you who now hate Peter Jackson). This means we will notice visual processing hiccups long before we react to new on-screen events.

Why does all this single-threaded-ness matter to us? Well, it turns out that most of the performance gains in recent years in server processors from Intel and AMD have NOT come from performing more computations on a single core, but rather to have many parallel cores performing parallel tasks.

Sadly for APB Reloaded, that type of parallel task division does not improve individual district performance... But... there is hope...

New OTW Hardware Test World going live: OverKill

In the near future we are about to release a new OTW (Open Test World) called OverKill. OverKill is actually an apt name and is the result of a lot of hardware experimentation by our IT team (and the above computation tests were run in this hardware as well).

The benefits of blade servers are that we can increase the density of the hosting operation, since we can fit 16 servers in 10 "rack units." The drawbacks - the types of processors supported by blade servers and the inability to overclock those processors - have caused us some serious problems in optimizing the hardware for the game.

For quite some time we have been looking for a new processor solution specifically to handle Financial and Waterfront (and eventually Asylum) districts. Something that can live in our three datacenters, but at the same time give us a cost effective solution to run at much higher single-core clock speeds, while also taking advantage of the newer "Sandy Bridge" and "Ivy Bridge" Intel processor architectures.

After much playing around with various combinations of server chips, it turns out that server boards and server chips really don't like or even permit overclocking and they are almost never engineered to optimize single-threaded performance (other than the incidental improvements that come from larger L2 and L3 cache systems), and we also need at least 6 cores to be able to perform these calculations in a cost effective manner (which let's us run 3 fully loaded districts on a single server) which left us with a conundrum.

After experimentation we have settled on having a public test using a custom solution that uses a high end desktop board (ASUS Rampage 4 Extreme) combined with an unlocked Intel i7-3930K 6-core processor that in a datacenter settings (with lots of cold air) easily runs stable at 4.25Ghz (technically we can push it to 5Ghz, but we are starting small).

Will it work once we throw real APB district computations into these systems? The synthetic test indicates it will indeed work. Will I/O performance hold up (given the strangle-hold that network I/O has on server CPU)? That's much harder to test, so we will find out as soon as we start running the OTW tests.

In a synthetic benchmark the i7-3930K OC (compared to the stock X5570) shows raw gains of nearly 70% in single-threaded performance (!). We do lose two cores per server, but the extra expense (more servers) seem worth the vast performance gain.

If we can capture some of these performance gains in the real world, and translate it into improved Action District performance, then our longterm goal is not only to ensure a stable 30 fps server tick-rate, but gradually be able to raise the CCU in each district as well.

From the graphs on software optimization, you can see that the new hardware with the new software 'COULD' run a theoretical server-side tickrate of 62 FPS, which is 206% more than we actually require for the 30 FPS tick target rate.

Our plan is to use the extra performance (again once we have run the real world tests) to ensure we can increase CCU in a single district. Since CCU taxes the server in a non-linear fashion, we expect to only increase CCU 25%-50% before dragging the server back down to 30 FPS tick. Of course this is still speculation, and is still to be determined during live testing.

Higher district CCU would mean better matchmaking (but THAT is a whole other blog entry, though needless to say 80 people in a district means 20 teams with potentially 10 ongoing matchups whereas 120 people in a district means 30 teams with 15 ongoing matchups, resulting in 50% improvement in match availability. Of course it's not quite that simple - but you get the gist). More players = better matchmaking.

In this post we have only talked about server side processing and optimization, and have not touched the OTHER things that also affect performance First and foremost - you need a good gaming rig to play APB. We always recommend having 8GB of RAM and using 64-bit Windows 7. Anything less is asking for trouble. In particular using 64-bit Windows is critical. Also client-side FPS in most Unreal games tends to drop dramatically during very large semi-transparent VFX events (i.e. very big explosions where the player does NOT die - something APB of course has a lot of) so only higher end graphics cards tend to perform ok during those big VFX events (and to optimize that part of the engine code is a whole other ball of wax, far beyond the scope of this current post).

Of course network connectivity, and your latency to our core datacenters are critical as well (Los Angeles, Washington DC and Frankfurt) or to the datacenters managed by our Russian (Moscow) and our Brazilian (São Paulo) publishing partners.

I hope this article has shed some light on the optimization work currently being done. If you are one of our OTW testers, then expect to see the "OverKill" world come online in the next two weeks. And for everyone else we expect to release 1.10.2 very soon, which should have some immediate performance improvements.

'If we achieve 30 FPS on the server, then connected game clients can easily run at 2X-3X the server tick rate (60FPS - 90FPS) without any noticeable loss in accuracy. At 1:2 or 1:3 server-to-client ratio movement prediction and frame-interpolation provide a very smooth game experience.'

What do you mean by smooth? Smooth as in all my bullets hit or smooth as in the game is more fluid (client-fps wise)?

I am so happy to be reading this. I feel that APB is a seriously fun and addicting game. It pains me to see it not very popular due to laggy servers, poor FPS... etc. So news of some long needed optimization is music to my ears.Hopefully we can expect to see some more gun rebalencing and match making improvements in the not so distant future.

I'm happy to see a little team doing big things :)I was never a FPS/TPS fan... that's until i played APB. And now to see you guys making sure we have get a better gameplay experience. New districts and content are welcome, but yeah, first thing's first.Good luck with those "experiments" :D

This game (because of the admissive and incredible customization) is one of the best FPS games out there in my honest opinion. And it is SO sad to see the lagg driving so many people away, to fix that problem... It would be the best delayed christmas present ever :D

This game is unplayable for me now. The same rig I have, used to run APB at max settings with a solid 60fps and the servers were "okay" at the time. Now? I have all settings turned to minimum on the same exact rig, and the framerate is abysmal... I'm stuttering and teleporting everywhere. I complain in district about this, and everyone says the same thing.. that it's terrible for everyone.

Please stop trying to convince us that it's our hardware to blame. I am running win 7 64 bit... I have a high end AMD3 processor, a decent video card and 16 gigs of ram. Yet I can't run APB on even AVERAGE settings? Something doesn't seem right. At some point the optimization for this game (both client and server) went to hell.

It's really sad because I was actually enjoying the balance changes you guys have made recently, but the frequent teleporting is making this game totally unplayable for me.

I think that some of the network problems are happening from UDP packets that arrive out of order because of the BGP routing. But its difficult to determine how significant this problem is unless you perform end to end testing for each carrier that K2Network is using.

Actually you'd be surprised, when I say that there aren't many cheater (atleast on patriot EU2) I mean 1-2 a day, sometimes even none. But I cannot say anything about the other servers, since I don't play there.

The issue isnt hackers.. Its peopl complaining about how they get their butts kick and auto assume you're cheating. I get called a hacker EVERYDAY from some randoms. At least 4 times a day I get called a hacker but my account has been up since the second week of beta for G1.

It is really great to read something like that. And I'm not talking about the changes itself and what could be coming with it. It's just great that you finally start to talk to us again so we can keep track of what you are doing and see that you really care about the game (even thought it might not always look like it).Humans don't always need proofs to be convinced. They only need faith and this is created by information, hopes and wishes. Anyway keep up the good work and the information flow! I'm sure the gamer will appreciate this.

About the Post itself: I really can't wait to see it going live. To be honest I dont really believe that it will go as well as you guys wish, but it would be great if it does!

Cool, but me..., i didin't have problems with client mine's allways on 62, i have more problems with memory,acctually i don't know what that is... i think it's RAM's i allways have it on 2200(red) please someone reply how to fix this...My RAM : 8Gb

"We are really proud that APB Reloaded has remained a consistent top-5 (out of 100+) in Steam's Free2Play category" I don't think they realised over 50% of that category is DLC, there is actually only 42 F2P games on Steam, that's not much to compete with to be honest."so only higher end graphics cards tend to perform ok during those big VFX events" So why are me and other people with these powerful cards still experiencing issues!?

This update is nice and eazy to understand, but u also need to take a look on the gameplay. I'm sorry to told ya but the G1 AntiCheat and Punkbuster are just crap for apb. Please look for a new one and push it in :) Frost from APB:RU seems a really nice pick.

Is there a time frame on how long us avid APB'ers will have to wait for this update to be implemented. Also has there been any consideration on potentially running VFX on Client side and having to save the poor servers?

We are currently pushing to get 1.10.2 live in the next week or so. But it depends on the new patch passing all QA tests. One of the challenges with a performance update build is that it could introduce other random issues that are hard to test for. Also to your other question; VFX is always client-side-only. It was only mentioned as an example of what reduces frame rate on the client.

4.5 ghz you can just go for without actually doing anything. That's what every single enthusiast is doing with Sandy and Ivy bridges across the world :) I do 4.7 on my 2500k, but that necessitates a decent Vcore bump...

Anyway, 4.5 would be the place to start, imho.

I cannot wait for the patch to come :)

On the side:

You say you could run up to 3 fully loaded districts on each of the servers. I think the best way to split them up would be to run Financial, Waterfront and Asylum on these new servers while keeping some of the older hardware to use on Beacon and Baylan maps - they don't require nearly as much power and you wouldn't need to divert any of the new server performance to those maps. Just a thought anyway :)

Actually it's how most shooter games operate. The reason is simple; if you care about who-shot-who first (think Counter-Strike or Unreal 3), the only way to maintain absolute consistency between clients is to run one monolithic game loop. We have other games that use multi-threaded servers (FallenEarth is one example), but they have other consistency problems and are clearly in the RPG category, not TPS/FPS category.

18,000+ district actors really why do you need so many... get rid of the boxes that no one cares about reduce the amount of cars on the roads.... remove 8k of it and you have a steady game 18,000+ district actors lol that is just pathetic... now i know why RUW went bust... because all that crap they put in when it was not needed.

Very interesting, to be honest I had no clue APB was running it's servers on a single thread (that might explain something).

LOL why don't you guys start implementing multi-threading? That's sarcastic by the way, I know how hard that can be (I am learning threading right now). However I worry, more and more we have realized that CPUs can't get faster on a single core due to heat, size, ext. So threading is the only solution. Can APB ever have the ability to support the multi-threading?

Theoretically, if you could modulate the code well enough, couldn't you assign 1 thread to each component, those being, customizations, district props, NPCs, ext (might take time, but I feel this could be a possibility ... but who knows, spit balling here)

Also, you mentioned the whole district runs in one clock cycle. Now I am going to assume that when working with players you do a sort of SYN and ACK (synchronize and acknowledge) concept, were you wait for each player to give you info correct? If that's the case, could 1 player in a district then bring everyone down, forcing this process to wait for all the ACK (I assume there's a timeout timer, but still)?

And one other thing. I have come to realize that APB relies heavily on Virtual Memory, most likely to store customizations. So when a player joins a district, are all their designs upload to the server and then downloaded by everyone else, and they download everyone in the districts designs? This would explain why when you first log in, everyone is default character and car, but after just a short amount of time, everyone's customizations are already loaded when you see them (and newly spawned cars for instance take very little time to load since it's one and not all).

I would then assume if this is true, symbols don't get loaded the same way, that's why a Spray point takes so long to apply. It's sending out the data to everyone.

Would love to hear back on some of this ... yes I am a tech nerd, who loves to figure out how stuff works.

Also thanks for sharing this with us, and keep the tech talk coming, it's nice sometimes to not have it dumbed down (even if sometimes I don't get it all)

It's not actually an APB-specific thing. It's standard shooter structure. And no - the servers are smart enough to avoid having one rouge client bring everyone down (ie - if a client sends unknown or no input, that client is just ignored). Same threaded server structure applies to Counterstrike, CoD Multiplayer etc. Multithreaded servers are used mostly for RPGs and RTS games, but even there a lot of elements have global locking type functions, which makes them more single-minded than most of us would want. So again, the server architecture makes total sense up to 32v32. It's the wild domain above that which requires new techniques or more horsepower (or both - which is what we are exploring).

I'll be honest... with all the money you guys made off of Nano sales, you should be able to pony up and use real server hardware. One Intel Xeon E5-2687W config would trump that 3930k. Just a thought...

Actually - little known fact - the E5-2687 and 3930K are the same processor (both SandyBridge E). The 3930K has had two of its actual 8 cores disabled with a laser to make it a 6-core processor. The issue is that the E5-2687 only runs at 3.1Ghz 'natural' or 3.8Ghz Turbo. We don't really benefit from the 8 (vs 6) cores, and on the 3930K we can run it easily at 4.25G. Our dev's actually use that in their workstations now at 4.6Ghz. The bigger test will be to see how stable it will run once it's all in production. That's something we will know very soon(TM).

Can you guys please include the option to disable local blood particles, meaning the blood particles of your own character?

When being shot at by a Tommy Gun, SHAW, NFAS, or Colby CSG, the blood particles lag your game so badly that even when you are clicking to shoot, your game drops to 5 to 8 FPS and your gun is not shooting correctly even if you are clicking it.

It's nice to see this but if you're going to overclock, that setup does not look good for it. I feel with big cpu loads you are going to see temp issues. You should have a look at this website http://www.overclock.net/ as I'm a member here and these guys do some amazing things and will have already solved many of your unseen issues. When I overclocked my pc it helped so much and this is what a well cooled system looks likehttps://www.youtube.com/watch?v=1eGY5B8mJmY

Let people host their own action district server. Like Teamfortress2, items and character data are still hosted on your server while only the action districts are hosted on peoples own server. So actually you would get free admins against hackers and many people would come back to APB + you safe the money for your cheap intel desktop hardware "server" urghhhhh. And yes this is possible... very easy possible. Public server files ftw.

Well - it's not so much about 'cleaning it up' as it is to prioritize things. Generally Open Sourcing code never happens until the game is long dead. Also just because it goes OSS, doesn't actually mean progress will be fast on any updates or changes. See other OSS games that have been slowly chugging along for years with very slow progress. So in short. No.

But we may consider releasing a different version of the game for altogether different reasons... But that's a later story.

The orientation of the RAM and motherboard is unfortunate, and perhaps not 100% efficient. However, with the desktop/server hybrid, this is what we’ve had to deal with on the first draft of the system.

That cooler is sucking cold air over the CPU at 9000RPM, there's an airflow shroud that fits over the top of it funneling it to the 9000RPM 80mm fans which exhaust from the system into the hot aisle. There is high pressure cold air coming in over the RAM from the back of the case from the cold aisle in the datacenter – it’s hard to see it in the picture, but the fan is actually sitting *mostly* above the RAM. You have to remember that the case is 3" high and that cooling in that environment is different from your home system.