The Rust Programming Language

Rules

We strive to treat others with respect, patience, kindness, and empathy.

Keep criticism constructive

Criticism is encouraged, though ensure that your criticism is useful and actionable.

No zealotry

Stay mindful of the fact that different technologies have different goals and exhibit fundamentally different tradeoffs in pursuit of those goals. Keep it civil; no flamewars.

Keep it chill

In the grand scheme of things, there are better things to get tilted at than a programming language.

Keep it on-topic

All submissions must explicitly reference Rust or relate to things using Rust. If you wish to submit a link that you believe would be of interest to the community but does not meet the above criteria, then please wrap the link in a self-post that explains its relevance.

For x86_64 targets rustc enables SSE2 by default. There are a couple of different ways to enable other target features, one is RUSTFLAGS and the other is .cargo/config. The latter was most useful for me as it meant I didn’t need to either set flags globally or remember to set them on every invocation of the compiler.

Note that you don't need to set compile time flags at all, and can instead check the CPU support for vector instructions at runtime. This is explained and documented in the std::arch module docs.

I'd recommend runtime detection of you can swing it, since it lets optimizations happen seamlessly without additional compile time flags, and permits one to compile portable binaries.

Oh right, I misunderstood and thought you needed to tell the compiler to emit code for that target feature even if you were going to dynamically select it at runtime.

I would like to set up the runtime detection, I think the biggest barrier is my SpheresSoA struct stores data in a different type depending on if I'm using SSE or AVX so I can get my data aligned correctly. Ideally I would like to tell Vec<f32> what alignment to allocate with then I wouldn't need to introduce other types. I'm sure I can work around this but I haven't put much thought into it so far.

If I could specify allocation alignment then that would be totally fine. I seem to be struggling to find rust's malloc though.

Using a [repr(align(64))] struct for example would work, but at the same time it would increase the chunk size to fit the alignment (to not waste space with padding between elements in arrays) which would make dealing with SIMD vectors that were smaller than the chunk size a bit painful. It would work, but I think it would complicate the code a bit and add a little more overhead.

I might go back to unaligned loads so I can just use f32 arrays until I come up with something.

Ok but that is more instructions than what I currently have and right now I'm optimising for speed. If I can allocate my float arrays 32 byte aligned then I can just use a chunk iterator based on the simd width and I know the input data has the correct alignment, so that seems like the simplest solution right now, should be able to do that with the unstable allocator API.

The typical approach is to use Vec. It doesn't currently allow specifying alignment beyond the type, but one can allocate slightly more space and then shift the start to be aligned, Vec::with_capacity(num_elements + alignment - 1).

This difference is very likely due to CPU frequency scaling differences between Linux and Windows. It does make benchmarking on my Linux laptop a bit unreliable though. You can adjust the frequency scaling on Linux.

So for the initial rays if the number of rays spawn is a multiple of the vector length (4, 8, ...) SIMD would still work on a per ray basis. For the subsequent bounces you are pretty much out of luck :/