September 24, 2009

Performance Considerations – Overview

This is the beginning of a series of posts on some of the challenges I faced in getting Aeon to run at an acceptable speed. Right now, I need to set up some background information so the following posts will make more sense. If you find any of this patronizing, unnecessary, or just too rambling, feel free to skip it.

The goal of Aeon is to simulate the environment of a 386 PC running DOS. By far, the single most taxing component is the CPU emulation. In Aeon, this is given its own thread, which just like a real CPU, decodes and executes instructions sequentially (yes, I know sequentially isn’t technically accurate for modern processors). Basically, if this part of Aeon is too slow, it doesn’t matter how fast anything else runs. All of the posts in this series will just consider performance as it relates to this critical path. There’s plenty of other areas to talk about though, and I’ll get to them eventually :).

So lets get some metrics defined now. I’m largely going to be measuring instruction throughput in MIPS. I’m aware that this isn’t a terribly useful real-world metric, but I find it provides a good indication of the impact of optimizations in my code. For instance, let’s say I run a particular game and it averages about 18 MIPS in Aeon. I then optimize part of the ADD instruction and it averages 20 MIPS in the same part of the same game. It’s a significant, relative increase in throughput. It’s a very simple way for me to measure relative speed without resorting to more sophisticated profiling.

I mentioned profiling above. I wrote Aeon in C#, and yes, Visual Studio includes some very nice managed code profiling tools which were very useful to me in the beginning for isolating the biggest performance bottlenecks. However, we’re dealing with tens of millions of instructions per second, and each one involves multiple method calls. Were I to instrument this with a managed code profiler now, it would fill up my hard drive before I got any useful information, not to mention destroying the somewhat delicate timing of my hardware timer emulation. I’ve been wanting to add a more specialized profiler to Aeon that I can use to measure how often certain instructions are emulated and how long they take, but it’s been running “good enough” for now, so this hasn’t been a priority.

Finally, I’d like to briefly mention the .NET JIT compiler. Basically, .NET methods are compiled into machine code the first time they are called (a bit of a simplification), so the compiler isn’t able to optimize the code as much as a static compiler can. The upshot to this is that Aeon runs as-is on x86 and x64, and for various reasons that I won’t get into right now, the x64-compiled version is actually a bit faster. Another benefit of this approach is that as future versions of the .NET framework are released, Aeon will take advantage of improvements in the JIT compiler. As an example, Wolfenstein 3D runs on my Phenom II processor at about 19 MIPS with the .NET 3.5 SP1 version of Aeon. When I tested the beta release of .NET 4, the same game managed around 26 MIPS.

Hopefully now you have an idea of where I’m coming from when I’m talking about CPU performance. Next time I’ll dig into some of the more interesting details.