Friday, December 16, 2016

In Visual Studio 2015, Microsoft decided to wreck the Find dialog so it's perma-docked into the upper right-hand corner of the document. The dialog is too small, and the key icons (to enable case sensitive searching or searching for whole words) are too small and hard to use:

By comparison, here's 2010's:

The new find dialog in 2015 is an example of bad UI, and I'm not the only Windows C++ developer I know who seriously dislikes it.

Saturday, December 10, 2016

I heard this playing at the local Pagliacci's recently, and I realized this is one of the tunes my father used to play all the time. He was in Vietnam in I think '68 or '69, totally lost alone in the jungle, and was saved by a branch of the Special Forces called the Green Beret's.

Monday, November 28, 2016

Age3 used CPU skinning of relatively low poly models (even in "high" model mode). To help improve this technical design misstep made by the Age3 team (before I joined the team near the end of production) I rewrote the skinning code to be multithreaded. Unfortunately, by the time I came on board the artists had already created a ton of low poly skinned meshes.

I also built the skinning DLL with Intel's compiler, so I was able to easily rewrite all the skinning code using SSE1/2 ops using compiler intrinsics. Back in those days MSVC's support for vector intrinsics was weaker than Intel's compiler. (I'm also the developer to blame for Age3's SSE requirement, which bit some owners of very early AMD processors who otherwise could have played the title at low frame rates.)

Anyhow, I mention this because if you play Age3 today, like on a 4k monitor, the game's terrain and other effects hold up pretty well. Except the skinned character models look terribly low poly by comparison. On Halo Wars I used GPU skinning, instanced rendering, and I heavily jobified the animation system.

There's still a lot of misunderstanding out there about where the Halo Wars engine technology came from. Starting in very early 2005 the HW team wrote a new engine pretty much from scratch. The Age3 code was only single threaded, didn't use SIMD, and consumed huge amounts of RAM. (Age3 used over 32MB just for UTF16 strings - not good for a console game!) The "Bang!" engine ran at ~7Hz and took around three to five minutes to load on the early Xbox 360 devkits.

Colt McAnlis (now Google), Billy Khan (now at Id Software) and I wrote the entire Xbox 360-only renderer almost from scratch. We started out with Age3's particle renderer and my "wrench" demo deferred shading engine for SM 2.0 hardware. Ensemble Studios basically gave us a blank check to do whatever we wanted on Xbox 360. (What good times!)

Age3's particle engine (written partially or mostly by Graham Devine, now at Magic Leap) was so good that the artists refused to allow us to rewrite it. Billy and I threaded it by converting it into jobs, and we SIMD'ified all the key loops using Altivec ops. We also offloaded as many computations as we could into vertex/pixel shaders, to cut down on the very high CPU cost of the original code.

The Halo Wars particle engine would have ran circles around Age3's (once ported back to x86).

Please don't get me wrong, Age3 was a beautiful and fun game, and I loved working on it. The team was super easy and pleasant to work with. Just remember that Halo Wars was created by a very different team with different goals. We had some pretty awesome goals for the next Halo Wars, but the studio was shut down.

Sunday, October 23, 2016

I've compressed the kodak test images using the prototype RDO ETC1 compressor I've been working on recently at various settings. You can download a .7z archive containing the RDO compressed .KTX files and unpacked PNG's here. The .KTX files can be loaded using the Mali Texture Compression Tool (v4.3.0).

Here are the unpacked images for 512 endpoints and 1024 selectors (1.65 average bits/texel vs. 2.85 average bits/texel for non-RDO ETC1):

Saturday, October 22, 2016

In this test on the 24 kodak images I quantized the ETC1 block colors/intensity tables (or what I've been calling "endpoints", from DXT1/BC1 terminology) to 128 clusters, but the selectors were not quantized at all. 128 clusters for endpoints is at the edge of usability for many photos.

This test also adaptively limits blocks to only a single endpoint (verses a unique endpoint for each subblock), if doing so doesn't lower the block's PSNR by more than 1.25 dB.

Anyhow, these two graphs show that this process is quite effective. Even at only 128 clusters, the overall SSIM is only reduced by around .01, while the bitrate is reduced by around .4 - .5 bits/texel.

The results look surprisingly good. I've made great progress on quality per bit over the previous few weeks, and I'll be posting images and .KTX files in a day or so.

Tests like this are important, because it shows that the RDO compressor is able to utilize all the features available in ETC1: flip/non-flipped, differential/absolute block color encoding, subblocks, etc.

crunch-style adaptive endpoint quantization at the block/subblock level is supported, but not at the macroblock (2x2 block) level yet. Also, the KTX writer backend is greedy, meaning it doesn't try to choose the best combination of selectors+endpoints that result in the least amount of compressed bits output by LZMA (or LZHAM). The lack of both features hurts compression. I have several other improvements to both quality and bitrate coming, but this is a good milestone.

Wednesday, October 19, 2016

I've transitioned my 2D-only prototype to a full-blown class now, instead of it living in my experimental framework as a huge function.

Next up are things like macroblock support, more endpoint/selector codebook refinements, an investigation into alternative selector compression schemes, and an experiment to exploit endpoint/selector codebook entry correlation. After this, I'm rewriting the code so it works on texture arrays, cubemaps, etc.

This rewritten new code will be the "front end" of the full ETC1 compressor. The back end (that does the coding) comes after the front end is in good shape. Unlike crunch, basis will use the same basic front end for both .RDO mode and .CRN (or .basis) mode.

This compressor is also compatible with the ETC1 "subset" format I mentioned here, which means it could be trivially transcoded to DXT1 with the aid of a precomputed lookup table.

I no longer feel so alone out here. I've been working on "Supercompressed Texture" technology for about a decade now, before I knew it would be named "Supercompressed Textures". The first title I was involved in that used GPU transcoding of compressed textures was for the PS2 version of World Series Baseball 2k3 (2003). It was designed by Blue Shift's then-CTO, John Brooks. This technology was then licensed to Electronic Arts for use in their PS3 titles.

And, my first Xbox 360 title (Halo Wars) relied on a real-time supercompressed texture decompression system I wrote in '06-'07, so the title would fit into memory at all. (crunch was actually my 2nd attempt at this approach, not my first.) So this tech has been around for years, being used behind closed doors in a low key way. It's like the academic world is just now catching on. In the professional game development world, this is advanced but still "old school" technology now.

My main bit of feedback about this paper, so far: The description of how the selector compression actually works is kinda muddled. (What's the "prefix sum" all about?) Also, it looks like crunch was used at the maximum quality level (255), not a tuned level or a number of levels. Crunch quality level 255 just isn't used in practice, to my knowledge. The codebooks at that level are huge and the image quality is unnecessarily high. Also, can I speed up crunch's CPU transcoder by 2-3x? Oh yes!

Another thing I noticed: Because GST doesn't support lossy endpoint quantization (like crunch does), I think its rate distortion performance is more limited than crunch's. crunch should be able to target lower bitrates than GST, is my guess. GST's main way of controlling the quality vs. rate tradeoff is its lossy dictionary-based selector compression method, while crunch can smoothly vary the quality of both the endpoints and selectors.

Next up: Universal Supercompressed Textures with either CPU or GPU decoding. (Isn't it obvious? We need to abstract away all of these crazy formats behind good technologies and shared tools.)

Saturday, October 15, 2016

I've been very busy refining my new ETC1 compressor, so I haven't been posting much recently. Today I decided to do something different, so I've been playing around with the 2D Haar 4x4 and 8x8 transforms (or here) on ETC1 selector bits. I first did this years ago while writing crunch on DXT1/BC1, but I unfortunately didn't publish or use the results.

To use the Haar transform on selector indices, I prepare the input samples by adding .5 to each selector index (which range from [0,3] in ETC1), do the transform, uniform quantize, then do the inverse transform and truncate the resulting values back to the [0,3] selector range. (You must shift the input samples by .5 or it won't work.)

I have some ideas on how the 4x4 Haar transform could be very useful in Basis, but they are just ideas right now. I find it amazing that the selectors can be transformed and manipulated in the frequency domain like this.

About Me

Back in the day I worked for several years at Digital Illusions on things like the first shipping deferred shaded game ("Shrek" - 2001), software renderers, and game AI. Then, after working for Microsoft at Ensemble Studios for 5 years as engine lead on Halo Wars, I took a year off to create "crunch", an advanced DXTc texture compression library. I then worked 5 years at Valve, where I contributed to Portal 2, Dota 2, CS:GO, and the Linux versions of Valve's Source1 games. I was one of the original developers on the Steam Linux team, where I worked with a (somewhat enigmatic) multi-billionare on proving that OpenGL could still hold its own vs. Direct3D. I also started the vogl (Valve's OpenGL debugger) project from scratch, which I worked on for over a year. In my spare time I work on various open source lossless and texture compression projects: crunch, LZHAM, miniz, jpeg-compressor, and picojpeg.