I had to produce an animation towards the end of the research project, and as usual, there was not enough time left for heavy-duty video editing work. Everything for the animation had to be done really fast and still produce the correct result. As the video should compare our algorithm

Labelling: I needed to put labels on the animation, so it would be clear which part shows which algorithm.

Four-up comparison: I.e. splitting the screen into four quadrants, and playing back four different streams in sync.

High compression quality: Of course, the compression should not introduce new artefacts or hide those from the algorithm.

Duct tape, glue, and a hammer

I decided pretty early on to reuse the animation framework I had for regression testing. That is, I could produce fixed-frame-rate rendering of the animation path and write out every single frame to disk. So the first step was to do exactly this, some Python glue and then just dumping out the whole animation.

For processing the images, I decided to use ImageMagick as the all-around tool. In particular, ImageMagick has great functionality to label images, so I slapped a Python wrapper around the ImageMagick executable and went through all images to label them.

Stiching & cropping can be also done with ImageMagick. While it requires a lot of disk I/O to do so, it's still pretty fast and there is no quality loss in the pipeline (all my images where PNG throughout all the processing.) I did everything step-by-step with intermediate outputs for easy debugging. That is, I cropped the image firsts, generated four outputs, stitched them together and added the labels on top. That's easier than adding the labels first, especially if you need to resize the images. For resizing, ImageMagick provides point-filtering which is very useful as there is no blurring from the resizing, which was crucial for my stuff.

Finally, I would number the images and run them through VirtualDub to create a single AVI file with no compression -- basically appending each image after the other. There was only one step left, compressing the AVI, and it turns out that FFMPEG's h264 implementation is pretty good, especially with the two-pass compression with variable bit rate. I used something around 2.5 MBit for 720p, which resulted in a 10 MiB file for 30 seconds.

Caveats

There are some caveats here which took me a while to figure out:

After labelling a 24-bit PNG with a transparent label, ImageMagick changed the depth to 32-bit. Unfortunately, VirtualDub cannot load 32-bit PNGs, and all the guides I found which explained how to get rid of the alpha channel somehow didn't help. What helped was to write the final image as TGA.

FFMPEG two-phase encoding: With one-phase encoding, the quality seems equivalent but the compression is worse, so I opted to use the two-pass encoding. My main problem was that the input AVI was larger than my main memory, so reading through the file was pretty slow ... still, the two-pass results were worth the additional compression time. For the record, I used the very_slow presets :)

Virtual Dub's default framerate is 10 frames per second. You have to change it before writing the AVI, for some reason or another, setting the fps in FFMPEG didn't help.

During the last months, I implemented lots of different graphics algorithms with many settings. One core theme that appeared throughout this project was testing: Both how to prevent regressions, as well as how to build a framework that new algorithms can be added quickly to it and compared against others. Of all that of course without excessive programming effort :)

Automation

Pretty early on in the project I decided to have all testing be fully automated. That is, with one single command line I wanted to be able to re-run all tests and get a quick yes/no overview which showed me if there are any regressions. Looking at my previous post-mortem, I started by adding Lua into the UI to make it scriptable. Via the Lua script interface, I could run the application, move to any position, save every buffer and change the settings. This interface was very thin and most Lua-scripts where only a few lines long. What turned out to be pretty useful is to be able to pass command-line options into the Lua environment, so I could run for instance app Width=1920 Height=1080 Fast and read those variables from the Lua script.

Adding Lua to the application turned out to be very straightforward. Most of the calls went from Lua to the application, so there was no object sharing or so which would make stuff complicated. However, I occasionally called back to Lua for filters (for instance, I had a mode to store the image in tiles and this called back into Lua so I could select the interesting tiles from script.)

However, while Lua is very easy to use, I had more complex requirements for the testing:

Needed to run lots of tests as fast as possible. The machine has multiple cores, so parallelisation was crucial to cut down test times.

Compression: The tests would generate large amounts of data -- a full test run could easily produce 20 GiB of raw files. These files compressed all pretty well, so I had to get some compression either into the app or into the test framework.

Easy-to-browse output: Ideally some HTML file or so where I can quickly find the results.

Diff reports: Did the latest changes improve the algorithm, or did it regress?

I'm pretty sure of all this can be coded up with Lua, but it'll require lots of searching around for libraries and documentation. As this was research, and time was critical, I opted instead for Python (3.1).

Test infrastructure

The final infrastructure I came up with looked like this:

A bunch of very low-level Lua scripts to perform animations and the rendering itself. All output images would be stored directly to disk.

Python would call the Lua-scripts with different settings as needed. Comparing the images to the reference was done via pdiff and ImageMagick. Other operations used a custom tool which was driven by command line and JSON setting files.

Python would then pack all resulting images into a .zip file and pickle the results into it as well. Pickle turned out to be all I needed here; I wrote a small tool to quickly investigate the contents of any given test pack file for debugging. The pack files were versioned, so I never had issues with loading wrong stuff -- in that case, the pack was simply regenerated (always remember to store all settings necessary for that inside the pack :))

A final Python script merged results together and produced nice HTML output using my custom template engine (Miranda, closely modeled after Google's CTemplate engine but written in Python.)

The last merging step turned out to be crucial. During development, I often wound up tweaking one particular algorithm, but I still wanted the complete comparison. Of course, this required some kind of "delta" packaging which I did via merging. The script loaded both results, and tried to unify them. If a previous entry was already existing, it would create a diff (i.e. new algorithm version is better/worse than before.) If no previous entry existed, it would just copy. This gave me very fast turnaround times and didn't require any changes to the on-disc storage format.

Small note on the JSON parsing: Due to lack of time, I used boost::property_tree. It's ok for reading JSON, but the JSON it generates looses all type information. Next time I'll be likely using a full-blown JSON parser to have "loss-less" JSON transformation.

Performance tuning

While this framework was all nice, the performance was not that great at first so I opted for a few optimisations. First of all, I made extensive use of Python's multiprocessing module to run all tests in parallel. This already provided a huge benefit, as I could overlay very compute intense parts (pdiff in particular) with lots of I/O.

The second, not-so-obvious optimisation was to overwrite files all the time. In particular, for ImageMagick, most operations where compare A to B and save to C. At first, I always used different file names for C before packing them. It turned out that overwriting C all the time was much faster. I suspect that Windows flushes file creation to disk, which totally makes sense -- but I was surprised to see how much faster it became once I just overwrote the same file. Side note: As I used multi-processing, I had of course one file per process.

Caveats

Some warnings: If you use Python's multiprocessing, always remember to use

if__name__=='__main__':doTheFork()

Otherwise you have a fork-bomb which will immediately freeze your system. The multiprocessing error handling is not that great either. In particular, you should avoid throwing exceptions and instead just pass around some result which indicates an error, and do the error handling in the host process.

Python's zip module is pretty decent, but there's not streaming reads from it. This turned out to be pretty bad at first as I stored PPM images (huge!) which I wanted to stream out of the archive directly instead of storing them on disk. I later switched to PNGs, which were easier to manage. It was still beneficial to pack the files into a single .zip, once because of less clutter with hundreds of test-files and second because I could version complete packs.

Next week, we'll take a look at the animation backend -- old-school command-line magic for producing nice-looking videos.

After resisting it for a long time, I finally gave in and I created an account on twitter recently: NIV_Anteru. I do see a point in writing small titbits from time to time which don't warrant a whole blog post; especially links and small observations during programming.

Once I get back from NVIDIA, I should have some more time for full-blown posts again.

Bazaar 2.2 has been released recently. For those who don't know it: Bazaar is a very nice distributed revision control system, similar to git and mercurial. Where Bazaar really shines is the ease-of-use, good documentation and good interop with other VCS. Performance-wise, it's also pretty good and much faster than for instance Subversion on most operations. The UI is excellent, as there is a "canonical" UI provided by the developers, which uses Qt and looks the same on all supported platforms. This is really important for me, as I occasionally switch between Linux and Windows, and I really like the fact that I don't have to get used to different UI clients.

Bazaar 2.2 brings some nice performance improvements, especially the GUI starts up faster now. It works fine on 2.x repositories, unlike in the pre-2.0 times, the repository format is now fixed since quite some time, so there is no need to update.

Something totally different about Bazaar: Contributing to Bazaar is quite easy, unlike most other projects I tried. There's extensive documentation available, including a "contributing to Bazaar" guide. The developers are quite open, and you usually get useful answers from them. I found contributing to be easiest on Linux, on a typical Ubuntu setup you can get running in 20 minutes. For example, the contributions I've done were written on Ubuntu 10.04 running in a virtual machine.

If you're still using a centralized system like Subversion, it's time to try a distributed system like Bazaar now, as they have become a compelling alternative. If you never tried Bazaar because of FUD like slow, changing repository format, etc. -- you should definitely give Bazaar 2.2 a try. It's a really nice, polished and mature revision control system, and it's worth a look. Especially if you use Windows, Bazaar is a good alternative to git, as it treats Windows as a first-tier platform.

I'll likely won't be able to update my blog too often in the next weeks, as I started an internship at NVIDIA research. Hopefully this break shouldn't be too long, but I seriously need to sort out a few things first.