I'd like to take this opportunity to advertise tool that might be very useful for developing ray tracers or any othercompute intensive applications on CPU. I develop package for python that allow you to utilize CPU SIMD instructions (SSE, AVX, AVX2, AVX-512, FMA).Basically its JIT compiler that compiles simplified Python code to native x86 machine code. To use SIMD instructions I addvector data types float32x4, float32x8, float32x16, etc... so you can easily do explicit vectorization. Before compilation I check whatinstruction sets CPU supports and then select best one. So basically this means if you want to achieve maximum performance all that is neededto do is to use biggest vector types supported. (float32x16, float64x8, int32x16) as much as possible and all the magic happens automatically.Even if your CPU only has SSE instruction sets you still benefit from using wide vector types because of memory locality.This tool is still WIP because there is still lots of work to be done but even at this stage it is very useful. I start developingpath tracer just to show how this tool is used.

NM, re-reading, it is clear that it doesn't generate object code. I think it's a nifty project, and probably a good way to learn stuff, but I think if you spend effort writing optimized C++, I'd be very surprised to see this perform similarly.

Yes you are right when you sad that you be very surprised if this was fast as optimized C++. I am programming about 15 years now and on numerous occasions i tried to optimize some function with hand written assembly code and compiler always beat me, but I learn lot in the process. Over the years a got better in assembly but still i admit that C++ compilers generates better code than I am. But when you turn to SIMD instructions thingsare suddenly changed. Now programmer is responsible for writing compiler SIMD intrinsic so now I compete with other programmers and not compiler.And also because I am doing JIT compilation i have lot's more context to work with because I know exactly what CPU you have. So in the and its not clear which code will be faster that why I sad that you get similar performance as optimized C++. Now i will show simple example just to see exactly what is going on and how SIMDy works. Below example is trivial but it will show one of biggest advantage of SIMDy and that is how it adapt to different instruction sets automatically, depend of you CPU capabilities for handling float64x8 data type AVX-512, AVX2, AVX or SSE will be used. Best thing here is that programmer does't care about your CPU is just works. Even if your CPUhave only SSE instruction you still benefit from float64x8 type because of memory locality. Hint: for best performance always use float64x8 Here I put explicitly AVX-512 as preferred instruction set because currently default is AVX2 but this will be fixed in next version and default willbe AVX-512.

# put some values for parameters of kernelk.set_value('b', float64x8(2.0))k.set_value('c', float64x8(3.0))

k.run()print(k.get_value('a'))

# you can of course inspect assembly code if you wantprint(k.asm)

Yes you can write kernels in Python and use it from C++ but in that case you must embed Python in your project and use it from there.Communication between Python and C++ can be in both directions, people usually are not aware of this.

While this looks interesting for Python developers, I still prefer developing my renderers with C++.

And for C++ SIMD development, I found Vc (https://github.com/VcDevel/Vc) to be a very good library (it is one of the two libraries being proposed for future C++ standardization, the other one being boost::simd). It also support many CPUs and SIMD architectures.

It does the same as what you are proposing except in native C++.

I'd be currious to see performance comparison between your python path tracer and the same path tracer written in C++ using Vc for vectorization.

In the future maybe i will do comparison between Vc library and SIMDy package.

I do not agree that this is interesting only for Python developers because it is very easy to embed Python interpreter in C++ application than SIMDycan also be used from c++.

For example in context of renderers if you embed Python interpreter in C++ you can use SIMDy as very flexible shading language like OSL (Open Shading Language), so user can write scripts that are very very fast.