I'm looking for opinions on modern day math libraries having support for scaler types. Steam reports 100% support for SSE2 and as best as I can tell any Android device within the last 4 years will have Neon support. So I am leaning towards there being no reason to support scaler data types E.g.

1
2
3
4
5

typedef struct vec{
float x;
float y;
//...
}vec;

assuming these are the target platforms, but I'm interested in hearing any counter arguments for them. Thanks.

What do you mean by not supporting scaler (scalar?) types?
You want to replace each member (x/y/z/w) with 4-element type (__m128/float32x4_t)?
Or are you asking to use __m128/float32x4_t types instead of struct with four floats?

For simple calculations like Casey is doing on Handmade Hero using "vector" type with 4 floats is perfectly normal and you won't get any big advantages over using __m128/float32x4_t types. Compiler will actually optimize most of them automatically. See proof here:
SSE: https://godbolt.org/g/2GRbPQ (only one addps, instead of four addss instructions)
NEON: https://godbolt.org/g/rWWXJC (only one fadd instruction, not four)

As advantage your code will be portable - it will compile for any architecture. Using SSE/NEON types/instructions will make your code messy and unportable. By that I mean specific algorithms are optimized a bit differently for SSE or NEON due to lack of specific instructions in one or another instruction set. They are also harder to debug. In my experience you always want plain C code as a fallback to simd code - especially for debugging.

Manually writing SSE and NEON instructions with intrinsics or inline asm makes sense only when you are processing a lot of data - like Casey does in software renderer. There are a lot of pixels on screen. Same with vectors and matrices, etc..

Very similar to DirectXMath if you have looked at that library at all. With this setup there is of course no reason not to support scalar types. However, this setup has some things that I don't like about it. Functions like add, mul, div, etc... where the number of components is irrelevant for SIMD end up having duplicated code for no reason. For example, v2Add, v3Add, and
v4Add all have the exact same code. While I don't love that, I feel like I would still prefer to have the distinct v2, v3, v4 types and live with the duplicated code. What is really bothering me though is in debug builds, having struct around the primitive types performs horrible compared to simple typedefs

1
2
3
4
5
6
7

typedef struct v4{
__m128 v;
}v4;
// versus
typedef __m128 v4;

In release builds there is no difference but in debug builds the struct version slows things down a lot to the point where the user would have to do something like:

So my two arguments against the structs are duplicated code and extremely slower debug code. If you drop the structs and just use typedefs, then I feel like you have to only have one vector type instead of v2, v3, v4, and also drop scalar support. Leading to my initial question, is there any argument for supporting scalar types. Thanks for your feedback.