Cracking down on heap abuse (part 2)

Also during the Mono Summit, Ben ran mono --profile on Banshee and alerted me to the fact that taglib-sharp was seriously abusing the heap. I had known it was less than optimal, knew about where the issue was, and knew the fix – but I really hadn’t had much time to address it.

Seeing real numbers was a big motivating factor however. The heap/memory numbers I’ll present here are total heap allocations – not a total heap growth that’s never GCed (leaked). That is, the numbers aren’t a heap reservation that’s present at one time, so it’s not something a user may really notice – these are accumulations of very small allocs/frees throughout the lifetime of the test program.

The Problem

In taglib-sharp, which is a fully managed C# port of TagLib (C++), there are four custom collections classes that provide some extra useful operations. This is where I have focused my optimization work as all formats use these collections, thus every format benefits from the work. Also, it was blindingly obvious what the problem was once I looked at the code.

Take the TagLib.ByteVector class. It serializes many formats of data into a collection of bytes. However, it was using the System.Collections.ArrayList class to store these bytes! As a byte type is stack-based data, and ArrayList stores heap-based data, each stack-based member of the collection must be boxed for storage and unboxed for retrieval. This boxing/unboxing operation essentially allocates a heap-based object to act as a container to store the stack-based byte data. So for each byte that’s pushed into the collection, another object is allocated on the heap! Yikes.

The Solution

Luckily, we have generics. Generics to the rescue – and I cannot stress this enough. Simply by replacing ArrayList with List<byte>, the excessive memory problem goes away, since the type to be stored in the collection is known at compile time, and thus boxing/unboxing is no longer necessary. While the fix is very simple, what we get in return is sheer love.

This was the first time I had really looked at all the custom collections code in taglib-sharp. I spent a few hours rewriting/refactoring it all to implement generic interfaces, and created a TagLib.ListBase<T> base collections class that provides common operations for all custom collections used in TagLib. In the process I made some other minor optimizations and clean ups, and now the code is much easier to read and is rather consolidated. Fun.

The Tests

To make sure I didn’t break anything in the process, I finally set up a pretty extensive NUnit test suite for taglib-sharp and implemented tests for every collection class. I also ported our old format tests from entagged-sharp to taglib-sharp and identified a few problems (which Brian has now fixed).

In the process of setting up the tests, and to see exactly how much better we are with generics, I wrote some performance stress tests as well.

Before my optimization work:

Using ByteVector.FromUri() on a 4.1MB M4A file (this call simply loads the entire file into the ByteVector, creating about 4,300,000 bytes in the vector – this is not how files are read in “real life” in taglib-sharp, but is a raw stress test of the ByteVector collection):

Total Heap Allocations: 103,246 KB (Yes, that’s 103 MB!)

Execution time: 1.7 seconds

Using File.Create() on the same 4.1MB M4A file 10,000 consecutive
times (this is how you actually process a file for metadata in taglib-sharp, and shows more “real life” optimizations):

Total time (10,000 iterations) 00:01:17.1391680

Average time (1 iteration): 0.0077027999 seconds

After my optimization work:

Using ByteVector.FromUri()

Total Heap Allocations: 16,421 KB (Wow, dropped to 16 MB!)

Execution time: 0.04 seconds (whoa!)

Using File.Create()

Total time (10,000 iterations) 00:00:41.0676870

Average time (1 iteration): 0.0040969793 seconds

So what’s that mean? The average tag reading operation is now about 200% faster and uses substantially less memory during the operation. This speed up translates directly to applications.

These numbers are also just from optimizing TagLib.ByteVector. After optimizing the other three collection classes, things speed up even more. For instance before the optimization, Banshee took around 9.5 minutes to import 5100 audio files on my machine. After the optimization it took about 4 minutes – a 5.5 minute speed up. Keep in mind there are other factors during a real import, such as directory walking overhead, mimetype detection, and disk caching, all of which accounts for about 0.5 minutes of the overall importing process.

Also, each format may use the collections differently, so some may benefit more than others. I’ve now moved on to identifying optimizations that can be done at the format level. For instance, I found out that taglib-sharp’s OGG/Vorbis support is very, very fast. However, the MPEG4 format is very, very slow in comparison. This led me into further investigation and comparisons.

Comparing taglib-sharp to other players in the game (sort of)

In the 0.10.x series in Banshee, we had used entagged-sharp, a fully managed metadata/tag reader. It was pretty good, but did not have write support, had a number of issues dealing with poor/illegal/improper string encodings in ID3 tags, and only had partial MPEG4/AAC and ASF/WMA support. I had originally planed on moving to GStreamer for tag reading, but there were a number of problems with this:

A hard dependency on GStreamer for tag reading

More native< -->managed code/interop

Missing/poor demuxer support for some formats (missing in terms of “it’s not in -base or -good, so it doesn’t exist for us in practicality” ) – this was the biggest factor in my decision to not go with GStreamer for tag reading

GStreamer 0.10 really isn’t suited to easily, efficiently, and safely do strict tag reading. The way GStreamer 0.10 works makes it rather difficult/expensive to perform a blocking operation, which is what’s often needed for tag reading.

Anyway, I’m not trying to put down GStreamer – I just came to the conclusion that for strictly reading tags, it’s probably not the best solution. I use its tag reading support in the playback pipeline to get live metadata updates from streams, etc.

For 0.11.x, I ended up choosing a new player in the game, taglib-sharp. It had full read/write support and supported MPEG4 and ASF very well. It also handled poor/illegal string encodings “properly” – or at least much better than entagged-sharp.

taglib-sharp does everything we need it to do in terms of functionality, so now it’s just a matter of improving its performance. All that said, I constructed a test today that compares taglib-sharp, entagged-sharp, and GStreamer. Each tag reader is used 10,000 times on each file in the test suite.

The results show that GStreamer is vastly slower than either taglib-sharp or entagged-sharp. This may not be entirely fair however, as the GStreamer code was under development a few months ago, and I have a feeling the bottleneck there isn’t so much the demuxing process, but rather having to wait on the processing thread in order to create a blocking operation. Running the GStreamer tests also tend to crash every so often. I had to remove MPEG4 from the test as I don’t have an MPEG4 demuxer for GStreamer, but it’s substantially slower in taglib-sharp than it is in entagged-sharp. My ASF/WMA test was also removed as my ASF demuxer in GStreamer would get completely stuck.

taglib-sharp’s OGG/Vorbis and MPC support is really good, but loses out to entagged-sharp with everything else at the moment. So, long story slightly longer: these formats will be the focus of my future optimizations.

You can try my tests for yourself (and see my GStreamer tag reading code if you want to complain/flame/advise/etc – just know I never completed it as I did a 180 and moved to taglib-sharp). I’ve disabled the GStreamer test since it’s unstable and doesn’t work on all formats that taglib-sharp and entagged-sharp do.

Anyway, I’ll commit the generics/optimizations patch against taglib-sharp this evening, and it will then be available in Banshee.

As a student of computer science, I love reading posts like these: an honest account of weak spots in an application and how they are fixed. I feel like a lot of applications have these types of leaks but are never addressed. I’m glad my favorite program is constantly being refactored =).

If I read correctly, you are planning to use taglib-sharp for all tagging now? Also when gstreamer experiences speed-ups in tagging, do you plan to switch to that? Why move away from entagged-sharp if its so much faster?