Wednesday, December 29, 2010

If you are not familiar with 4k intros, you may wonder how things are organized at the executable level to achieve this kind of packing-performance. Probably the most important and essential aspect of 4k-64k intros is the compressor, and surprisingly, 4k intros have been well equipped for the past five years, as Crinkler is the best compressor developed so far for this category. It has been created by Blueberry (Loonies) and Mentor (tbc), two of the greatest demomakers around.

Last year, I started to learn a bit more about the compression technique used in Crinkler. It started from some pouet's comments that intrigued me, like "crinkler needs several hundred of mega-bytes to compress/decompress a 4k intros" (wow) or "when you want to compress an executable, It can take hours, depending on the compressor parameters"... I observed also bad comrpession result, while trying to convert some part of C++ code to asm code using crinkler... With this silly question, I realized that in order to achieve better compression ratio, you better need a code that is comrpession friendly but is not necessarily smaller. Or in other term, the smaller asm code is not always the best candidate for better compression under crinkler... so right, I needed to understand how crinkler was working in order to code crinkler-friendly code...

I just had a basic knowledge about compression, probably the last book I bought about compression was more than 15 years ago to make a presentation about jpeg compression for a physics courses (that was a way to talk about computer related things in a non-computer course!)... I remember that I didn't go further in the book, and stopped just before arithmetic encoding. Too bad, that's exactly one part of crinkler's compression technique, and has been widely used for the past few years (and studied for the past 40 years!), especially in compressors like H.264!

So wow, It took me a substantial amount of time to jump again on the compressor's train and to read all those complicated-statistical articles to understand how things are working... but that was worth it! In the same time, I spent a bit of my time to dissect crinkler's decompressor, extract the code decompressor in order to comment it and to compare its implementation with my little-own-test in this field... I had a great time to do this, although, in the end, I found that whatever I could do, under 4k, Crinkler is probably the best compressor ever.

You will find here an attempt to explain a little bit more what's behind Crinkler. I'm far from being a compressor expert, so if you are familiar with context-modeling, this post may sounds a bit light, but I'm sure It could be of some interest for people like me, that are discovering things like this and want to understand how they make 4k intros possible!

This first version can be considered as stable. The Direct3D10 / Direct3D10.1 API has been entirely tested on a large 3D engine that was using previously SlimDX (thanks patapom!). Migration was quite straightforward, with tiny minor changes to the engine's code.

The key features and benefits of this new API are:

API is generated from DirectX SDK headers : meaning a complete and reliable API and an easy support for future API.

Full support for the following DirectX API:

Direct3D10

Direct3D10.1

Direct3D11

Direct2D1 (including custom rendering, tessellation callbacks)

DirectWrite (including custom client callbacks)

D3DCompiler

DXGI

DXGI 1.1

DirectSound

XAudio2

XAPO

An integrated math API directly ported from SlimMath

Pure managed .NET API, platform independent : assemblies are compiled with AnyCpu target. You can run your code on a x64 or a x86 machine with the same assemblies, without recompiling your project.

C++/CLI Speed : the framework is using a genuine way to avoid any C++/CLI while still achieving comparable performance.

API naming convention mostly compatible with SlimDX API.

Raw DirectX object life management : No overhead of ObjectTable or RCW mechanism, the API is using direct native management with classic COM method "Release".

Easily mergeable / obfuscatable : If you need to obfuscate SharpDX assemblies, they are easily obfusctable due to the fact the framework is not using any mixed assemblies. You can also merge SharpDX assemblies into a single exe using with tool like ILMerge.

You will also find a growing collection of samples in the Samples Gallery of SharpDX. Most notably with some additional support for Direct2D1 and DirectWrite client callbacks.

Instead of providing a monolithic assembly, SharpDX is providing lightweight individual and interdependent assemblies. All SharpDX assemblies are dependent from the core SharpDX assembly. You just need to add the required assemblies to your project, without embedding the whole DirectX API stack. Here is a chart that explains SharpDX assembly dependencies:

Next versions will provide support for DirectInput, XInput, X3DAudio, XACT3.

About performance

Someone asked me how SharpDX compares to SlimDX in terms of performance. Here is a micro-benchmark on two methods, ID3D10Device1::GetFeatureLevel (alias Device.FeatureLevel) and ID3D10Device::CheckCounterInfo (alias Device.GetCounterCapabilities).

The test consist of 100,000,000 calls on each methods (inside a for, with (10 calls to device.FeatureLevel) * 10,000,000 times) and is repeated 10 times and averaged. Repeated two times.

Method

SlimDX

SharpDX

SharpDX vs SlimDX

device.FeatureLevel

3700

3650

1,37%

device.GetCounterCapabilities()

4684

4259

9,98%

For FeatureLevel, the test was sometimes around +/-0.5%.
For GetCounterCapabilities(), the main difference between SharpDX and SlimDX implementation is that SlimDX perform a copy from the native struct to .Net struct while SharpDX is directly passing a pointer to the .Net struct.

This test is of course a micro benchmark and doesn't reflect a real-world usage. Some part of the API could be in favor of SlimDX, but I'm pretty confident that SharpDX is much more consistent in the way structures are passed to the native functions, avoiding as much as possible marshaling structures that doesn't need any custom marshaling (unlike SlimDX that is performing most of a time a marshaling between .Net/Native structure, besides they are binary compatible).

Next?

Finally, I'm going to be able to use this project to make some demos with it! Next target is to develop a XNA like based framework based on SharpDX.Direct3D11.