Sony Computer Entertainment Inc. contributed Vector Math library and SIMD math library as open source under the BSD license. Bullet physics SDK will be the main repository. Vector Math was previously only available to licensed PlayStation 3 developers.

That's a very good news for everybody looking for a well written math lib, especially the SoA one which is quite rare.

Anyway, it seems to me that I'm facing an issue:
- The sse implementation of boolInVec uses for the == operator the _mm_cmpeq_ps intrinsic (vectormathlibrary\include\vectormath\SSE\cpp\boolInVec.h line 207).
- The __m128 argument stores either 0xfffffff or 0x0 to reflect a boolean value (respectively true or false). See constructor boolInVec::boolInVec(bool scalar) line 139.
- As seen as a float value, 0xfffffff is a NAN.
- According to sse instruction specifications (that I found here : http://www.cs.cmu.edu/~410/doc/intel-isr.pdf), _mm_cmpeq_ps always returns false when comparing NANs.

Thus comparing 2 boolInVec that are initialized to "true" with the == operator returns false, which is not the expected result.

When you compare two numbers in SIMD, you should be always aware of the type of entities you compare. VMX and modern SSE allows you to treat register contents as integers, masks or floats freely, but it doesn't mean you can magically add two numbers when they're float and int. Or compare two masks as if they were floats.

cmpeq_ps will compare two floating-point numbers. If you have masks in your registers, it is logically a wrong operation to perform. Comparing masks bit-wise would be a NOT-XOR operation, for example.

If the architecture, abi, compiler, and math library all cooperate, then returning a vector by value is cheap -- it goes in a register. Also, the less aliasing means more opportunities for the compiler to optimize.

It's not OK for the api to change between the scalar and simd implementations (the change in semantics is subtle but real). vectormath is really not a great choice if you're not going to use a simd implementation.

I ran some quick SSE tests with a naive c++ implementation, some intrisics on simple float arrays and the sony library. It appears that the prefetch and the cache have a huge impact on the performance. For example, my compiler seems to manage cache optimization on the naive c++ implementation which give better performances than Vectormath::Aos.