JSquish is a Java port of the DXT compression library Squish. See this article for details. I will release the full source code after I fix a minor implementation issue.

The above archive contains the library JAR, which is also executable. It's a tiny LWJGL application that can be used to compare the different DXT compression methods. There are four methods that can be switched (press 'C'):

Uncompressed The original texture

Compressed - Driver The texture is compressed by the OpenGL driver (using the EXT_texture_compression_s3tc formats)

Compressed - Range Fit The texture is compressed using the Range Fit method. This is a very fast method, with comparable results to the driver one.

Compressed - Cluster Fit The texture is compressed using the Cluster Fit method. This method was the real motivation for writing a Java port of the Squish library. It's an embarrassingly slow method (only useful for offline compression), but gives amazing results compared to the driver implementation.

I urge you to try your own textures (press 'O' to load a new image) and see the quality difference of the Cluster Fit method. Try relatively small images first, to get a feeling of how slow it is.

Our public facing interfaces are almost identical, which isn't surprising given the common heritage. The only big differences I see is that I javafied it (CLUSTER_FIT instead of kColourClusterFit) and I use byte[] instead of int[]. I ran a test with the following code with 100 warmup iterations and 100 benchmark iterations (should I use larger values here?)

I made the exact same thing some time ago. Time to do some performance comparisons

It's supposed to go in the product I'm working on, but I've been hesitant due to possible patent issues. Any idea whether squish falls under the s3tc patent or not?

Heheh! Given the compression quality though, I was suspecting that someone might have done this already.

I'm not sure on the patent issue, but there are another two OS implementations already. I don't think I've anything to worry about, but the case may be different in a commercial product (if that's what your product is).

Hooray I'm curious where the large difference comes from though. I would have expected my version to be slower since I used byte[] and have to do a & 0xFF each time I want to use a value.

That's exactly what I plan to do next. I've seen this before in several places, using bytes and & 0xFF is much faster than the equivalent with ints. Probably the & overhead is tiny compared to the memory/bandwidth gains. This is the minor implementation issue I mentioned in my previous post. I will also javafy the interface a bit.

Anyway, the bottleneck in Cluster Fit is the solveLeastSquares method that is being called ~500 times per 4x4 block, on average. It takes around 97% of the total execution time, but it is plain floating point math and has nothing to do with bytes vs ints. So, I can't see how your implementation runs so much faster than mine. Maybe the test image you tried is too simple? From my tests I've found that the algorithm performance depends very much on the image contents, it takes a lot of time to find the optimal solution for complex images.

I've compared my implementation of solveLeastSquares with yours. The biggest difference is that I inlined all the Vec3 stuff. Profiling showed that construction and manipulation of these objects was a hotspot, so I completely removed that stuff. Since that method is essentially the inner loop of the algorithm it makes a big difference. I've attached my cluster fit implementation so you can compare.

Thanks for the suggestions. These changes gave the following results:base: 210msreplace division by multiplication: 209msreplace Math.min/max with if/else: 176ms

I've attached an updated version. My profiler now shows the following hotspots:ClusterFit#compress4: 46%ClusterFit#compress3: 21%ClusterFit#solveLeastSquares: 8%I don't really have any ideas on how I could improve compress3/4 though.

Thank you both. After using pepijnve's code and adding Riven's (a.k.a. Java Performance God) suggestions, my implementation is now 4 (four) times faster! Apparently inlining made a huge difference. I was expecting Hotspot to do a better job and optimize away most of the method calls when using vectors.

Anyway, I've updated the archive with the new version. You can also download the library separately (265kb) if you want to compare it again.

I found the big difference between the two versions. My implementation was based on squish 1.7, while yours is probably based on 1.9. The main difference is that cluster fit now goes through multiple iterations. After updating my implementation average time went up to 266ms again. I then ran the test with the updated JSquish and that gave 147ms I profiled your version, and I didn't see compress3 popping up anywhere which seemed kind of strange, so I then peeked at ColourFit#compress. Your implementation differs from the original squish version. Squish does

which is a subtle difference that might cause a slight loss of quality. Squish uses the best result of compress3 and compress4, while your using either one or the other. No idea if this makes a big difference in practice though.

which is a subtle difference that might cause a slight loss of quality. Squish uses the best result of compress3 and compress4, while your using either one or the other. No idea if this makes a big difference in practice though.

Indeed, that was a piece of code that really confused me. I did a few tests last night as I was scratching my head and found that compress4 always produces the best result, that's why I removed the compress3 for RGB images. 3+4 makes the algorithm 100% slower for no apparent improvement, but I guess it's too obvious to be a mistake and there must be a reason 3 is there. I'll investigate further.

Yeah, I remember your tests. I prefer code simplicity over performance, but thankfully in this case the problem was only in one particular method, so it was no big deal to optimize. In general though, we need escape analysis + stack allocation + optimizations urgently.

- I switched everything to byte arrays. No performance difference at all, but at least no memory is wasted. - I did a few optimizations here and there. - I changed the public interface to use enums for options and added more entry points with default values. - I reverted compress to use both compress3 & compress4. Made some more tests and both are affecting the final image after all (altough with negligible difference).

I even tried to do gamma correction, but it didn't work out well. I moved everything to linear space in ColourSet, run the compression, then moved back to gamma space when writing the compressed block. I was expecting to see a nice, subtle difference, but instead I got a somewhat darker image and minor artifacts. I believe the problem lies with the error computation or the grid clamping, but couldn't figure a way to solve it. Anyway, it's good enough as it is.

Btw, I forgot to mention that Java 5.0 is required and Java 6.0 highly recommended (up to 20% faster on the client VM).

I've been pulling my hair out trying to find the cause of the remaining performance difference between our two implementations (mine was a bit slower ). Going over both codebases with a fine toothed comb I think I found a bug in your implementation of computeWeightedCovariance. In your implementation you pass in a Matrix instance that is reused. I checked the original squish code and you definetly should be resetting the matrix values to 0 before you start the accumulation loop. Otherwise you'll be reusing covariance values from a previous calculation. Sadly enough when I corrected this in JSquish the performance gap only got larger I'm kind of stumped as to what could be causing this, especially since my profiler is telling me the exact opposite. The search continues...

which caused incorrect results obviously, but also caused solveLeastSquares to be called twice as much as in JSquish. This tiny change chopped 80ms off of the average time, bringing it to 100ms. Time for a celebration Thanks Spasi and Riven for all the information. It's defintely been an interesting learning experience.

In your implementation you pass in a Matrix instance that is reused. I checked the original squish code and you definetly should be resetting the matrix values to 0 before you start the accumulation loop. Otherwise you'll be reusing covariance values from a previous calculation. Sadly enough when I corrected this in JSquish the performance gap only got larger

Thanks a lot pepijnve, it was indeed a bug and fixing it improved performance considerably. I wonder why such a bug didn't have any effect on the compression quality though. Anyway, I have updated the library archives.

103ms. The difference is due to a reordering of some of the code. I rearranged things so for each image I only have to make a single instance of SingleColour, Range and ClusterFit. These methods have a reset method to intialize any fields now. It's not pretty but it produces less garbage. I chose this approach over your approach of making most of the fields static because I wanted to be able to compress multiple images in parallel.

First of all, thank you for porting this to java - we're now using DXT for most of the Tribal Trouble textures. However, I think decompressing DXT in jsquish is broken, since ColourBlock.unpack565() doesn't account for the byte sign properly. I've changed unpack565 to:

Squish on its own only performs the DXT compression. The output is just a bunch of DXT blocks. It's pretty straightforward to write these to a DDS file afterwards, so offline processing is definitely possible.

Dang. Any chance of a 1.4 compatible version? Or can the whole process be run offline and saved to a DDS or similar?

It could work on even older versions, but I'm too used to 1.5 features right now. There are a couple of enums, some static imports and maybe some efor loops, you're free to modify the source if you need it.

java-gaming.org is not responsible for the content posted by its members, including references to external websites,
and other references that may or may not have a relation with our primarily
gaming and game production oriented community.
inquiries and complaints can be sent via email to the info‑account of the
company managing the website of java‑gaming.org