I have also play a lot with QMC on the GPU (Notice that I can't use it in my renderer due to patents... but...) I have really notice any improvement.

QMC give good stratification, etc etc... but ...

1) After 1000 samples it is better to go with uniform random numbers2) On the GPU my renderer is slower with QMC (surely due to incoherent memory accesses etc..) and so, even for first frame... I got better results with uniform RNG

So, honestly, I have never see where the QMC helps on the GPU ! Does someone notice this too ?

Ehm yes, I did some qualitative comparisons yesterday, and I noticed both things: it is slower (despite the calculations being lighter), and it seems to give poorer results... I would love to see some insight on this from the QMC gurus.

spectral wrote:Notice that I can't use it in my renderer due to patents...

This is quite surprising to me - what patents? How were they allowed to patent it? I presume that e.g. Halton sequence was invented by a guy named Halton (and not, say Nvidia?). Perhaps the patent was issued because they were the first to use QMC on GPU?

I'm not a QMC guru in any way, but after working hard for quite a while I actually did manage to implement a sobol sampler that does improve convergence by quite a lot. My implementation might not be most academically correct one, but it works, and I've put quite a lot of hours into measuring convergence vs other samplers, and it beats MT, Halton and Fauré (although they're pretty close). And all this with not too objectionable correlation patterns during rendering, and no hashing or scrambling whatsoever, just the pure ouput of the sobol sample generator.

The 1-thread scenario is simple (oh, and I don't generate my samples ahead, I just draw them as long as the path continues (I'm using the Joe & Kuo data so I can go up to dimension 21201)

As I said, the 1-thread scenario is simple as you just jump along the sequence incrementing the sobol sample index for every new pixel sample. My biggest problem was doing it across many threads. I tried scrambling (using a unique scramble value for every thread), C-P rotation and a global/shared index counter (for #1 in my description above). I found that scrambling reduced convergence somewhat and also produced objectionable correlation patterns, C-P rotation gave less correlation patterns but they were still objectionable. Using a global (shared by all threads) sobol index counter gave me the nicest result and also the best convergence. I know this is a lousy way to do it, but speed was not my objective. Gruenschloss published a paper about exactly this, http://gruenschloss.org/parqmc/parqmc.pdf, but to be honest the math (section 3.3) is a bit too dense for me so It'll be a while before I'll be able to implement it.

Seeing as how the image plane is a continuous domain and all. Granted, the model falls down a bit if you're not using a box filter, but jitterX would still be a "random" number in [0,1) that could be used for any other 2D filter.

You're absolutely right, of course. This was exactly what I did up until recently (although I never separated the position and the jitter, i just generated floating point coordinates with the first pair of dimensions). I guess it's a matter of taste, as the correlation patterns were a bit more fine grained and gave a more uniform impression when I generated the jitter explicitly with a new pair of dimensions. I liked what I saw and I stuck with it