I was working on an efficient map modification library and figured that this could benefit lots of other people. This small optimization lesson will walk you through different variations of the same snippet of code, each one more optimal than the last, ultimately leading to huge benefits in performance.

For those of you who are unfamiliar with the Halo 2 map format, every map file contains a 32 bit hash 720 bytes into the header, which allows the game engine to detect any modifications that have been made to the map. They do this by scanning the map file during initialization and XORing every integer after the map header and comparing their result with the hash. If they do not match, the level will “fail to load”.

For starters, we will be looking at this basic signing algorithm that I see is used quite frequently among the community. Some of these examples below assume 16-byte alignment for the map size, although the game requires much more than that in order to properly load, so these assumptions should be fair ones to make. Benchmarks were performed by executing an optimized release build outside of the debugger, although you will most likely see different results depending on your particular setup. Please also keep in mind that you will need to slightly modify these code samples to use your own information, instead of my MapHeader.SizeOf, Header.FileSize, Header.Signature, and Stream stuff…but you get the idea ;)

As you can see, it starts reading data after the header and XORing one integer at a time until the end of the map is reached.

Some of you may now be thinking, “well how much more efficient can you get than that? It is just a simple XOR of each integer after all…” To answer that question, you will need to understand a little more about what is going on behind the scenes. Every operation, you access the file (SLOW), update the checksum, compute the hash range size, check if its finished, and branch accordingly. The main performance killer here is too much file IO, although the others can and do slightly affect the performance as well.

So first off, how do we minimize the amount of file accesses? We will start to buffer our reads. The trick here is to pick a buffer big enough to decrease the amount of file accesses, yet small enough to not thrash the CPU cache when working with the larger amount of data. For these reasons, I chose a moderately sized buffer of 64 kilobytes. If you have newer hardware it may be more beneficial to increase this size even further, depending on your HDD and CPU cache sizes. With these facts in mind, here is our new result…

Just by blocking your file IO, you can obtain a dramatic increase in performance. This algorithm should outperform the other one above by being at least 6 times faster.

To push our performance gains even further, instead of calling BitConverter.ToUInt32() thousands of times introducing lots of unneeded overhead, we can try dealing with the data on a much lower level by using unsafe code. Working with pointers and fixed buffers allow us to speed things up even further as demonstrated by this next piece of code.

Here, I’m seeing performance gains around 11 times faster than the original form. Not too shabby, but still, more can be done to help speed things up even further. This last technique is commonly referred to as loop unrolling, which will help to decrease some of the loop overhead, and can become quite beneficial in small tighter loops such as these.

This one clocks in at being around 23 times faster than our original algorithm…can’t get much better than that without delving into c++ and harnessing the power of SIMD and assembly.

Many of these optimization practices are not limited to just this algorithm, but can be used in many different situations, and are capable of producing much more efficient code at the unfortunate expense of decreased readability. Through buffered IO, unsafe code, loop unrolling, and other minor optimizations, we were able to increase the performance of our generic piece of code making it 20 times faster.

Now, these types of optimizations should really only be done after properly profiling your application to determine where it spends the most of its time, and this was more or less just a lesson on some of the optimization techniques you have at your disposal when required to do so.