Saturday, October 1, 2016

ETC1 encoder performance on kodim18 at various quality/effort levels

I'm trying to get a handle on how the available ETC1 compressors perform, using their public API's, at their various quality or effort levels. This is only for a single image (kodim18 - my usual for quick tests like this).

Efforts between roughly 40-65 seem to be the sweet spot. Effort=100 is obviously wasteful.

Here's another graph (can you tell I'm practicing my Excel graphing skills!), this time comparing the time and quality of various ETC1 (and now ETC2 - for etc2comp) compressors at different encoder quality/effort settings:

Important Notes:

To be fair to Intel's ETC1 encoder (function CompressBlocksETC1()), which is not multithreaded, I added another bar labeled "ispc_etc1 MT" which has the total CPU time divided by 20 (to roughly match the speedup I'm seeing using 40 threads in the other natively multithreaded encoders).

basislib is now using a variant of cluster fit (see previous posts). basislib_1 is lowest quality, basislib_3 is highest. Notice that basislib_3 is only ~2X slower than Intel's SIMD code, but basislib doesn't use any SIMD at all.

basislib and etc2comp both use 40 threads

etcpak's timings are currently single threaded, because I'm still using a single threaded entrypoint inside the code (BlockData::Process()). It's on my TODO to fix this. IMHO unless you need a real-time ETC1 encoder I think it trades off too much quality. However, if you need a real-time encoder and don't mind the loss in quality it's your best bet. If the author added a few optional SIMD-optimized cluster fit trials in there it would probably kick ass.

To get an idea how efficient etc2comp currently is at scanning the ETC1 search space for kodim18, let's see what effort level (and how much CPU time) etc2comp takes to approximately match two other encoder's quality levels:

9 comments:

1) Estimating a 40x speedup for a multithreaded Intel encoder is probably too generous. I'd expect closer to a 20x speedup on a 20-core CPU using 40 threads.

2) While ETC2Comp is viable for ETC1-only encoding, it's really designed for the more complex ETC2 formats so it can handle Alpha textures, normal maps, and HDR (11-bit-per-component) textures.

I'm not sure why users would choose to encode with ETC1 only as ETC2 adds quality and features.

3) The Effort feature of ETC2Comp allows users to control the full range between 'high-speed' and 'high-quality' encoding, but as your benchmark graphs and research have shown, if it was possible to also add cluster-fit techniques to ETC2Comp, then both speed and quality would likely see a nice boost!

So what you're saying is: it boils down to the distribution of used texture types. If a product is targeting ETC1-only capable devices (which is a huge number of currently deployed devices - several hundreds of millions), all of these map types (normal maps, alpha textures, tables, etc.) must be encoded into ETC1 blocks (or perhaps some uncompressed format). For alpha textures, developers do things like use atlases or use multiple ETC1 textures. Importantly, there are many products that require compressed textures where "advanced" things like normal maps, HDR, 1/2 channel tables are not used or valuable.

If a dev is targeting ETC2, they are free to pick whatever ETC1/ETC2/EAC format that can get the job down with an acceptable amount of quality. Now whether or not ETC1 matters so much depends on the distribution of these asset types, and on the decisions the graphics programmers make when deciding which texture format (ETC1, ETC2, or some EAC encoding) they use for each texture types.

My main point is, graphics programmers will be biased in their decision here. If high quality ETC2/EAC texture encoding is much slower than high-quality ETC1 encoding, there will be a tendency to stay away from ETC2/EAC (because it takes much longer to encode) and instead choose formats which are extremely fast to encode at acceptable quality. It's all about effective product value delivered vs. developer cost+pain.

Also, I can just enable ETC2 mode and create another 2nd graph. Now we can see how much CPU time it takes to encode multithreaded Intel ETC1 (or basislib_3) vs. etc2comp ETC2 to approximately the same PSNR (or SSIM, etc.).

ETC2 is supposed to be approx. 1 dB better than ETC1 (according to one Ericsson presentation), so presumably etc2comp will have to expend "less" effort to compete against ETC1 at the same quality.

The additional ETC2 graph entries are great! I think they show that ETC2Comp has room to improve on both ETC1 and ETC2 RGB block types. However the idea I was trying to get at before was that the majority of ETC2 block types supported by ETC2Comp will not be tested using a corpus of RGB-only textures.

The key thing I realized while making ETC2Comp is that the primary value of ETC2 is not that it can add 1dB on RGB textures, but that it can 'level up' the ETC format and allow encoding the full range of texture types used by developers.

ETC1 is great for LDR RGB textures with no alpha, but that's about it. ETC2 moves into the league of BC1-7 and ASTC.

The analogy I would use is that ETC2 is similar to BC1-7, and evaluating an ETC2 encoder based purely on it's ETC1 quality is similar to evaluating a BC1-7 encoder based only on it's DXT1 (non-alpha) quality.

It's much easier to optimize for a single-format, and while opaque RGB is the most common texture type in many use-cases, the value of these encoders is in their ability to handle the full spectrum of image and numerical data encoded in textures by modern games & applications.

Opaque-only RGB is a great place to start benchmarking and you are doing awesome work by graphing and analyzing all these encoders. But by limiting the corpus and tests to RGB non-alpha, it's really exercising only a fraction of ETC2Comp's abilities and design.

If you (or anyone) was bored and wanted to compare ETC2 vs BC1-7 vs ASTC across RGB, RGBA, normal maps, HDR, and numeric vector data, then we'd have a comprehensive understanding of quality & speed.

Yup, totally understand. Seriously, etc2comp sets a new baseline for a high quality GPU texture compression library, which is why I'm benchmarking it so much. It supports ETC1/ETC2/EAC, which accounts for all the texture types needed by modern developers. In that way, it's awesome.

About Me

Back in the day I worked for several years at Digital Illusions on things like the first shipping deferred shaded game ("Shrek" - 2001), software renderers, and game AI. Then, after working for Microsoft at Ensemble Studios for 5 years as engine lead on Halo Wars, I took a year off to create "crunch", an advanced DXTc texture compression library. I then worked 5 years at Valve, where I contributed to Portal 2, Dota 2, CS:GO, and the Linux versions of Valve's Source1 games. I was one of the original developers on the Steam Linux team, where I worked with a (somewhat enigmatic) multi-billionare on proving that OpenGL could still hold its own vs. Direct3D. I also started the vogl (Valve's OpenGL debugger) project from scratch, which I worked on for over a year. In my spare time I work on various open source lossless and texture compression projects: crunch, LZHAM, miniz, jpeg-compressor, and picojpeg.