Debugging Jita Live is For Real Men

The latest developer blog out of CCP looks at the challenges of fixing problems and debugging in EVE Online.

One of the problems that we're faced with in running a cluster on a massive scale such as our beloved Tranquility is the fact that it's extremely difficult to test specific load issues before code is deployed onto the cluster.

We have a series of load-inducing tests that we run on our test servers and we get players to participate in huge fights on our public test server in order to gauge the effects of new code.

In Apocrypha we had a staggering number of changes to the code base from dozens of programmers working on three continents. Keeping tabs on the changes was a daunting task and, as always in large software projects, a few bugs slip the net and make their way to the production server.

In this case, a bug caused Jita to start suffering from performance degradation with 300 people in the system and we had no idea why. Basically all the nodes were running hotter than they should be, and in the case of Jita, it was running at 100% CPU capacity under load which should only have it running at 30%.