Sunday, December 6, 2015

Mobile-friendly binary delta compression

This is basically a quick research report, summarizing what we currently know about binary delta compression. If you have any feedback or ideas, please let us know.

Binary delta compression (or here) is very important now. So important that I will not be writing a new compressor that doesn't support the concept in some way. When I first started at Unity, Joachim Ante (CTO) and I had several long conversations about this topic. One application is to use delta compression to patch an "old" Unity asset bundle (basically an archive of objects such as textures, animations, meshes, sounds, metadata, etc.), perhaps made with an earlier version of Unity, to a new asset bundle file that can contain new objects, modified objects, deleted objects, and even be in a completely different asset bundle format.

Okay, but why is this useful?

One reason asset bundle patching via delta compression is important to Unity developers is because some are still using an old version of Unity (such as v4.6), because their end users have downloadable content stored in asset bundle files that are not directly compatible with newer Unity versions. One brute force solution for Unity devs is to just upgrade to Unity 5.x and re-send all the asset bundles to their end users, and move on until the next time this happens. Mobile download limits, and end user's time, are two important constraints here.

Basically, game makers can't expect their customers to re-download all new asset bundles every time Unity is significantly upgraded. Unity needs the freedom to change the asset bundle format without significantly impacting developer's ability to upgrade live games. Binary delta compression is one solution.

Another reason delta compression is important, unrelated to Unity versioning issues: it could allow Unity developers to efficiently upgrade a live game's asset bundles, without requiring end users to redownload them in their entirety.

Let's build a prototype

First, before building anything new in a complex field like this you should study what others have done. There's been some work in this area, but surprisingly not a whole lot. It's a somewhat niche subject that has focused way too much on patching executable files.

During our discussions, I mentioned that LZHAM already supports a delta compression mode. But this mode is only workable if you've got plenty of RAM to store the entire old and new files at once. That's definitely not gonna fly on mobile, whatever solution we adopt must use around 5-10MB of RAM.

bsdiff seems to be the best we've got right now in the open source space. (If you know of something better, please let us know!) I've been looking and probing at the code. Unfortunately bsdiff does not scale to large files either because it requires all file data to be in memory at once, so we can't afford to use it as-is on mobile platforms. Critically, bsdiff also crashes on some input pairs, and it slows down massively on some inputs. It's a start but it needs work.

Some Terminology

"Unity dev" = local game developer with all data ("old" and "new")
"End user" = remote user (and the Unity dev's customer), we must transmit something to change their "old" data to "new" data

Old file - The original data, which both sides (Unity dev and the end user) already have

New file - The new data the Unity dev wants to send to the end user

Patch file - The compressed "patch control stream" that Unity dev transmits to end users

The whole point of this process is that it can be vastly cheaper to send a compressed patch file than a compressed new file.

The old and new files are asset bundle files in our use case.

Bring in deltacomp

"deltacomp" is our memory efficient patching prototype that leverages minibsdiff, which is basically bsdiff in library form. We now have a usable beachhead we understand in this space, to better learn about the real problems. This is surely not the optimal approach, but it's something.

Most importantly, deltacomp divides up the problem into small (1-2MB) mobile-friendly blocks in a way that doesn't wreck your compression ratio. The problem of how to decide which "old" file blocks best match "new" file blocks involves computing what we're calling a file "cross correlation" matrix (CCM), which I blogged about here and here.

We now know that computing the CCM can be done using brute force (insanely slow, even on little ~5MB files), an rsync-style rolling hash (Fabian at Rad's idea), or using a simple probabilistic sampling technique (our current approach). Thankfully the problem is also easily parallelizable, but that can't be your only solution (because not every Unity customer has a 20 core workstation).

deltacomp's compressor conceptually works like this:

1. User provides old and new files, either in memory or on disk.

2. Compute the CCM. Old blocks use a larger block size of 2*X (starting every X bytes, so they overlap), new blocks use a block size of X (to accommodate the expansion or shifting around of data).

A CCM on an executable file (Unity46.exe) looks like this:

In current testing, the block size can be from .5MB to 2MB. (Larger than this requires too much temporary RAM during decompression. Smaller than this could require too much file seeking within the old file.)

3. Now for each new block we need a list of candidate old blocks which match it well. We find these old blocks by scanning through a column in the CCM matrix to find the top X candidate block pairs.

We can either find 1 candidate, or X candidates, etc.

4. Now for each candidate pair, preprocess the old/new block data using minidiff, then pack this data using LZHAM. Find the candidate pair that gives you the best ratio, and remember how many bytes it compressed to.

5. Now just pack the new block (by itself - not patching) using LZHAM. Remember how many bytes it compressed to.

6. Now we need to decide how to store this new file block in the patch file's compressed data stream. The block modes are:

- Raw
- Clone
- Patch (delta compress)

So in this step we either output the new file data as-is (raw), not store it at all (clone from the old file), or we apply the minibsdiff patch preprocess to the old/new data and output that.

The decision on which mode to use depends on the trial LZHAM compressions stages done in steps 4 and 5.

7. Finally, we now have an array of structs describing how each new block is compressed, along with a big blob of data to compress from step 6 with something like LZMA/LZHAM. Output a header, the control structs, and the compressed data to the patch file. (Optionally, compress the structs too.)

We use a small sliding dictionary size, like ~4MB, otherwise the memory footprint for mobile decompression is too large.

This is the basic approach. Alexander Suvorov at Unity found a few improvements:

1. Allow a new block to be delta compressed against a previously sent new block.
2. The order of new blocks is a degree of freedom, so try different orderings to see if they improve compression.
3. This algorithm ignores the cost of seeking on the old file. If the cost is significant, you can favor old file blocks closest to the ones you've already used in step 3.

Decompression is easy, fast, and low memory. Just iterate through each new file block, decompress that block's data (if any - it could be a cloned block) using streaming decompression, optionally unpatch the decompressed data, and write the new data to disk.

Current Results

Properly validating and quantifying the performance of a new codec is tough. Testing a delta compressor is tougher because it's not always easy to find relevant data pairs. We're working with several game developers to get their data for proper evaluation. Here's what we've got right now:

Above, the bsdiff sample near 1303 falls to 0 because it crashed on that data pair. bsdiff itself is unreliable, thankfully minibsdiff isn't crashing (but it still has perf. problems).Compressed size of individual frames, sorted by delta2comp frame size:

Firefox v10-v39 installer data

Test procedure: Unzip each installer archive to separate directory, iterate through all extracted files, sort files by filename and path. Now delta compress each file against its predecessor, i.e. "dict.txt" from v10 is the "old" file for "dict.txt" in v11, etc.

Interestingly, deltacomp2's output is ~8% larger than bsdiff.exe's on this artificial test data.

I'll post more data soon, once we get our hands on real game data that changes over time.

Next Steps

The next mini-step is to have another team member try and break the current prototype. We already know that minibsdiff's performance isn't stable enough. On some inputs it can get very slow. We need to either fix this or just rewrite it. We've already switched to libdivsufsort to speed up the suffix array sort.

The next major step is to dig deeply into Unity's asset bundle and virtual file system code and build a proper asset bundle patching tool.

Finally, on the algorithm axis, another different approach is to just ditch the preprocessing idea using minibsdiff and implement LZ-Sub.

4 comments:

Kind of reminds me of an experiment I was doing where I'd get, say, a 1024x1024 texture and then compressed it using h264 or vp9 into a 128x128 video clip of 64 frames, so utilising the video encoder's delta frames. It actually works out well for some textures, it might work well if DXTC/ASTC/PVRTC could be delta'd into tiles this way too.

Very interesting. I like your idea about doing this on GPU textures. (I am a sucker for anything involving texture compression.) I started messing with Fraps and VirtualDub, because it was an easy and quick way to get tons of similar data file pairs.

From memory I rigged up a python expression in Nuke to export out the image stream before putting it into FFMPEG. It could be done with something like OpenImageIO which is geared towards tiled mip-mapped textures (in fact I believe they do this with the FFMPEG backend already, but this was made after my experiment).

About Me

Back in the day I worked for several years at Digital Illusions on things like the first shipping deferred shaded game ("Shrek" - 2001), software renderers, and game AI. Then, after working for Microsoft at Ensemble Studios for 5 years as engine lead on Halo Wars, I took a year off to create "crunch", an advanced DXTc texture compression library. I then worked 5 years at Valve, where I contributed to Portal 2, Dota 2, CS:GO, and the Linux versions of Valve's Source1 games. I was one of the original developers on the Steam Linux team, where I worked with a (somewhat enigmatic) multi-billionare on proving that OpenGL could still hold its own vs. Direct3D. I also started the vogl (Valve's OpenGL debugger) project from scratch, which I worked on for over a year. In my spare time I work on various open source lossless and texture compression projects: crunch, LZHAM, miniz, jpeg-compressor, and picojpeg.