Daily Pathtracer 9: A wild ryg appears

In the previous post, I did a basic SIMD/SSE
implementation of the “hit a ray against all spheres”
function. And then
of course, me being a n00b at SIMD, I did some stupid things (and had other inefficiencies I knew about
outside the SIMD function, that I planned to fix later). And then this happened:

You used the ultimate optimization technique: nerd sniping rygorous into doing it for you =)
(via)

i.e. Fabian Giesen himself did a bunch of optimizations and submitted a
pull request. Nice work, ryg!

His changes got performance of 107 -> 187 Mray/s on PC, and 30.1 -> 41.8 Mray/s on a Mac.
That’s not bad at all for relatively simple changes!

ryg’s optimizations

Full list of changes can be seen in the pull request,
here are the major ones:

Use _mm_loadu_ps
to load memory into a SIMD variable (commit).
On Windows/MSVC that got a massive speedup; no change on Mac/clang since clang was already generating
movups instruction there.

Evaluate ray hit data only once (commit).
My original code was evaluating hit position & normal for each closer sphere it had hit so far; this
change only remembers t value and sphere index instead, and calculates position & normal for the final
closest sphere. It also has one possible
approach
in how to do “find minimum value and index of it in an SSE register”, that I was wondering about.

“Know” which objects emit light instead of searching every time (commit).
This one’s not SIMD at all, and super obvious. In the explicit light sampling loop, for each ray bounce
off a diffuse surface, I was going through all spheres, checking “hey, do you emit light?”. But only
a couple of all of them do! So instead, have an explicit array of light-emitting sphere indices,
and only go through that. This was another massive speedup.