Wednesday, September 23, 2009

The Other Cost of Code Bloat

The other day I almost wrote a redundant version of the exact same class that someone else on my project had written. In fact, if I hadn't have asked this person a couple general C# questions, and he hadn't put two and two together, I probably would have wrote that redundant class. Good detective work on his part, and shame on me for not doing a search of the code base to see if someone else had already tackled this problem. While I've got a pretty good feel of the C++ which makes up the majority of code in our engine/tools, I haven't looked at the C# side as much as I probably should have.

As the code bases we write get larger and larger, and the team sizes we deal with get larger and larger, the question of how to avoid this scenario becomes an important one. Ideally you hire programmers who perform the necessary code archeology to get a feel for where things are in the code base, or who will ask questions of people more familiar with the code when unsure. Getting a code base of a million or more lines "in your head" takes time, though. I've been working with our licensed engine for about four years now, and there are still nooks and crannies that are unfamiliar to me.

Better documentation should help, but in practice it is rarely read if it even exists. This is because usually such documentation is either nonexistant or if it does exist, horribly out of date. With a licensed engine, you are at the mercy of the little documentation you are provided, and at the end of the day, the code itself is the best documentation.

A sensible architecture with clear delineation of what should go where is often a bigger help. Knowing [where to look] is half the battle, said a saturday morning cartoon show. Again, with a licensed engine, you again are at the mercy of what you are provided. Finding existing functionality usually comes down to experience with the code base and code archeology skills.

Recently, Adrian Stone has been writing an excellent series on minimizing code bloat. Now while the techniques he describes aren't really about eliminating actual code and instead eliminating redundant generated and compiled code, the mindset is the same when you are removing actual lines of code. Aside from the important compile time, link time, and executable size benefits, there is another benefit to removing as much code as you possibly can -- the code will occupy less "head space."

Unused or dead code makes it that much harder to do code archeology. Dead code certainly can make it more difficult to make lower level changes to the engine or architecture, as it is one more support burden and implementation difficulty. In the past, removing large legacy systems (whether written internally or externally) has had unexpected benefits in simplifying the overall architecture -- often there are lower level features that only exist to support that one dormant system.

One of my favorite things to do is delete a lot of code without the end result of the tools or game losing any functionality. It's not only cleaning out the actual lines of code, but the corresponding head space that is wonderful feeling -- "I will never have to think about X again." With the scale of the code bases we deal with today, we don't have the brain power to spare over things we don't need.

2 comments:

I'd argue we don't even have the brain power to keep the things we need in mind. And I don't see our code bases shrinking, which leads to the (logical??) conclusion of computer-aided navigation of your code base.

Of course, C++ has slim pickings in that area - analyzing that language is bit painful, so pretty much every decent project and research in that area targets Java or C#. (I have hopes that that somewhat changes as LLVM/clang gain ground)

But another issue of code bloat is that our standard set of libraries is anemic. Each middleware usually brings its own platform-layer, math-layer, memory management, and often even containers. (No, STL doesn't count - it's got many shortcomings that make usage on the console side hard. EASTL addresses that, but unless the powers that be open-source it...)

And there's plenty of middleware to be had. Having 10 different vector4's is the result. That is obvious and completely avoidable bloat - if we can get the middle ware vendors to make those things replaceable.

In an ideal world, the industry would standardize on some packages, but given that most programmers can't even agree if they should call it u32, uint_32, U32, DWORD, or whatever their preferred nomenclature is, I'm not hoping for that any time soon ;)

What's interesting is I was doing some mental calculations of the code size of games I worked on for PS2 and Xbox vs games for 360 and PS3. In my experience, the size of the executable is growing almost as fast as the memory. The 360 and PS3 have 8X the memory of the Xbox, and 16X of the PS2. From what I can remember, the executables I'm dealing with now are anywhere from 6-10X bigger. It'll be interesting to see if this trend continues into the next generation -- I hope not!