Just sharing this implementation for sample mask construction required in the SIGGRAPH 2016 paper "Blue-noise Dithered Sampling" by Iliyan Georgiev and Marcos Fajardo. You can access the paper here: https://www.solidangle.com/arnold/research

As there didn't appear to be a strict implementation of the simulated annealing process found in the paper online, I thought this simple example might be helpful for someone. It should be fairly efficient using SSE3 instructions and TBB for multithreading.

Unfortunately I've not got any rendered images to share, but it shouldn't be too hard to implement. When the values are written out to the data file they are converted to an unsigned 32bit integer range. This is so that a toroidal shift can easily be applied (to particular sampling patterns) with a simple addition while taking advantage of integer overflow. Although this is assuming that the samplers are also working within the same unsigned integer range, and that the addition is done before converting the value to a floating point representation. You may also find that the current parameters are set rather low, to get a high quality result the iterations should be set much higher.

I'll try and get around to putting together some images of the fourier power spectra soon. Hopefully at some point I'll also find some time to create a simple example using pbrt v3.

I set the tile size to 64, iterations to 65536 * 2 and depth to 10 (for: lens, pixel, light sample 1, diffuse bounce, light sample 2); in that case the code takes about 1.5 *hours* to execute on a 12-core Xeon system (admittedly, old Xeons)... How long did your 128x128 tiles take?

I'll run the program tomorrow at a resolution of 128x128 and record the simulation time. I've not done this yet as the implementation here isn't (and does not claim to be) the same as the one found in the original paper There was a sizable performance gain when using AVX2 instructions, but this would have limited compatibility. As a result I've left the available code using SSE3 instructions on the master branch. Also thought it would be helpful to add a simple Fourier transform to give a better indication of the resulting quality, the images are attached.

There are some minor issues getting this to work with MSVC:a) The _mm_andnot_si128 should be a _mm_andnot_psb) a separate code path that uses _aligned_malloc instead of posix_memalign on MSVCc) same for free (_aligned_free)Maybe you can patch this then.. And thanks for sharing..

Better you leave here with your head still full of kitty cats and puppy dogs.

jbikker wrote:I set the tile size to 64, iterations to 65536 * 2 and depth to 10 (for: lens, pixel, light sample 1, diffuse bounce, light sample 2); in that case the code takes about 1.5 *hours* to execute on a 12-core Xeon system (admittedly, old Xeons)... How long did your 128x128 tiles take?

Would be interesting to hear your experiences when rendering with it, when optimizing for this 10D case. Cause IMHO the scheme should fall apart when using a large number of dimensions.

Better you leave here with your head still full of kitty cats and puppy dogs.