So I've been trying to collate a sensible framework for a standard
cross-platform simd module since Walter added the SIMD stuff.
I'm sure everyone will have a million opinions on this, so I've drawn my
approach up to a point where it properly conveys the intent, and I've
proven the code gen works, and is good. Now I figure I should get everyone
to shoot it down before I commit to the tedious work filling in all the
remaining blanks.
(Note: I've only written code against GDC as yet, since DMD's SSE only
supports x64, and x64 is not supported in Windows)
https://github.com/TurkeyMan/phobos/blob/master/std/simd.d
The code might surprise a lot of people... so I'll give a few words about
the approach.
The key goal here is to provide the lowest level USEFUL set of functions,
all the basic functions that people actually use in their algorithms,
without requiring them to understand the quirks of various platforms vector
hardware.
Different SIMD hardware tends to have very different shuffling, load/store,
component addressing, support for more/less of the primitive maths
operations, etc.
This library, which is the lowest level library I expect programmers would
ever want to use in their apps, should provide that API at the lowest
useful level.
First criticism I expect is for many to insist on a class-style vector
library, which I personally think has no place as a low level, portable API.
Everyone has a different idea of what the perfect vector lib should look
like, and it tends to change significantly with respect to its application.
I feel this flat API is easier to implement, maintain, and understand, and
I expect the most common use of this lib will be in the back end of peoples
own vector/matrix/linear algebra libs that suit their apps.
My key concern is with my function names... should I be worried about name
collisions in such a low level lib? I already shadow a lot of standard
float functions...
I prefer them abbreviated in this (fairly standard) way, keeps lines of
code short and compact. It should be particularly familiar to anyone who
has written shaders and such.
Opinions? Shall I continue as planned?

So I've been trying to collate a sensible framework for a standard
cross-platform simd module since Walter added the SIMD stuff.
I'm sure everyone will have a million opinions on this, so I've drawn my
approach up to a point where it properly conveys the intent, and I've
proven the code gen works, and is good. Now I figure I should get
everyone to shoot it down before I commit to the tedious work filling in
all the remaining blanks.
(Note: I've only written code against GDC as yet, since DMD's SSE only
supports x64, and x64 is not supported in Windows)
https://github.com/TurkeyMan/phobos/blob/master/std/simd.d
The code might surprise a lot of people... so I'll give a few words
about the approach.
The key goal here is to provide the lowest level USEFUL set of
functions, all the basic functions that people actually use in their
algorithms, without requiring them to understand the quirks of various
platforms vector hardware.
Different SIMD hardware tends to have very different shuffling,
load/store, component addressing, support for more/less of the primitive
maths operations, etc.
This library, which is the lowest level library I expect programmers
would ever want to use in their apps, should provide that API at the
lowest useful level.
First criticism I expect is for many to insist on a class-style vector
library, which I personally think has no place as a low level, portable API.
Everyone has a different idea of what the perfect vector lib should look
like, and it tends to change significantly with respect to its application.
I feel this flat API is easier to implement, maintain, and understand,
and I expect the most common use of this lib will be in the back end of
peoples own vector/matrix/linear algebra libs that suit their apps.
My key concern is with my function names... should I be worried about
name collisions in such a low level lib? I already shadow a lot of
standard float functions...

That is not really an issue. If it actually bothers someone, static or
named imports come to the rescue.

I prefer them abbreviated in this (fairly standard) way, keeps lines of
code short and compact. It should be particularly familiar to anyone who
has written shaders and such.

I agree.

Opinions? Shall I continue as planned?

Looks good. I think it should provide emulation in case of non-existent
hardware support (maybe even with a possibility to opt-out).

My key concern is with my function names... should I be worried about name
collisions in such a low level lib?

No. D's module resolution is good enough that prefixing names is not D-style
and
is to be avoided.

I prefer them abbreviated in this (fairly standard) way, keeps lines of code
short and compact. It should be particularly familiar to anyone who has written
shaders and such.
Opinions? Shall I continue as planned?

I'm far too overloaded at the moment to give this an in-depth review. I'm
hoping
others here will step up!

My key concern is with my function names... should I be worried about name
collisions in such a low level lib?

No. D's module resolution is good enough that prefixing names is not
D-style and is to be avoided.

One concern that has occurred to me relating to the D module system is...
without any traditional header files, how will this API inline properly? It
helps that every function is a template, so I suppose that forces it to
inline yeah?
I'm quite concerned by a lack of force-inline keyword... it can't be left
to the compiler to decide to inline these or not. they MUST be inlined,
there is no compromise.

On 5 February 2012 02:55, Walter Bright <newshound2 digitalmars.com
<mailto:newshound2 digitalmars.com>> wrote:
On 2/4/2012 11:57 AM, Manu wrote:
My key concern is with my function names... should I be worried
about name
collisions in such a low level lib?
No. D's module resolution is good enough that prefixing names is not
D-style and is to be avoided.
One concern that has occurred to me relating to the D module system
is... without any traditional header files, how will this API inline
properly? It helps that every function is a template, so I suppose that
forces it to inline yeah?
I'm quite concerned by a lack of force-inline keyword... it can't be
left to the compiler to decide to inline these or not. they MUST be
inlined, there is no compromise.

The 'enum' storage class would mean force-inline when generalized to
functions.

Looks good so far:
it could use float[2] code wherever there is float[3] code
(magnitude2 etc)
any/all should have template overloads to let you specificy exactly
which channels match, and simple hardcoded ones for the common cases
(any1, any2, any3, any4 aka the default 'any')
I have implementations of floor/ceil/round(to-even) that work on
pre-SSE4 hardware for float and doubles I can give out they are fairly
simple, as well as the main transcendentals (pow, exp, log, sin, cos,
tan, asin, acos, atan). sinh and cosh being the only major ones I left out.
I just need a place or address to post or mail the code.
D should be able to handle names and overloading better, though
giving everything unique names was the design choice I made for my
library, primarily to make the code searchable and potentially portable
to C (aside from the heavy use of const references as argument types).
On 2/4/2012 1:57 PM, Manu wrote:

So I've been trying to collate a sensible framework for a standard
cross-platform simd module since Walter added the SIMD stuff.
I'm sure everyone will have a million opinions on this, so I've drawn my
approach up to a point where it properly conveys the intent, and I've
proven the code gen works, and is good. Now I figure I should get
everyone to shoot it down before I commit to the tedious work filling in
all the remaining blanks.
(Note: I've only written code against GDC as yet, since DMD's SSE only
supports x64, and x64 is not supported in Windows)
https://github.com/TurkeyMan/phobos/blob/master/std/simd.d
The code might surprise a lot of people... so I'll give a few words
about the approach.
The key goal here is to provide the lowest level USEFUL set of
functions, all the basic functions that people actually use in their
algorithms, without requiring them to understand the quirks of various
platforms vector hardware.
Different SIMD hardware tends to have very different shuffling,
load/store, component addressing, support for more/less of the primitive
maths operations, etc.
This library, which is the lowest level library I expect programmers
would ever want to use in their apps, should provide that API at the
lowest useful level.
First criticism I expect is for many to insist on a class-style vector
library, which I personally think has no place as a low level, portable API.
Everyone has a different idea of what the perfect vector lib should look
like, and it tends to change significantly with respect to its application.
I feel this flat API is easier to implement, maintain, and understand,
and I expect the most common use of this lib will be in the back end of
peoples own vector/matrix/linear algebra libs that suit their apps.
My key concern is with my function names... should I be worried about
name collisions in such a low level lib? I already shadow a lot of
standard float functions...
I prefer them abbreviated in this (fairly standard) way, keeps lines of
code short and compact. It should be particularly familiar to anyone who
has written shaders and such.
Opinions? Shall I continue as planned?

Looks good so far:
it could use float[2] code wherever there is float[3] code (magnitude2
etc)

Yep, I intended to do this. You'll see I added dot2, I just didin't add the
others yet :P
Note: this is FAR from complete, I just wanted to get initial opinions
before I took it too far.
any/all should have template overloads to let you specificy exactly which

... I'll look into it again more closely, but I don't think I can bring
myself to do this. It's ONLY really possible on SSE. Something so expensive
shouldn't be in the base API I don't think.
The only case where this operation is particular common is working with 3d
vectors. In my experience (fairly extensive, on many architectures) you
will almost always have 0's or 1's in the W anyway, which you can control
the mask by choosing greater or greater-equal. With careful consideration,
you can achieve this at zero cost, and not providing that API leads you to
consider such a construct.
I have implementations of floor/ceil/round(to-even) that work on pre-SSE4

hardware for float and doubles I can give out they are fairly simple, as
well as the main transcendentals (pow, exp, log, sin, cos, tan, asin, acos,
atan). sinh and cosh being the only major ones I left out.

I did plan to add all of these, just haven't gotten to it. You're more than
welcome to contribute your implementations.
I recommend a sincos() functions (and friends) as well. Assuming you
implement them as a taylor series, it's more efficient to calculate both at
once, and it's rare that you ever call one and not the other.
I just need a place or address to post or mail the code.
Pull request? :)
Or email me: turkeyman at gmail
D should be able to handle names and overloading better, though giving

everything unique names was the design choice I made for my library,
primarily to make the code searchable and potentially portable to C (aside
from the heavy use of const references as argument types).

/agree, but the names I've used are so standardised and expected, that I'm
really apprehensive to use different names.
Need more opinions to make a good decision, but currently I'm leaning
heavily towards keeping it how it is.
On 2/4/2012 1:57 PM, Manu wrote:

So I've been trying to collate a sensible framework for a standard
cross-platform simd module since Walter added the SIMD stuff.
I'm sure everyone will have a million opinions on this, so I've drawn my
approach up to a point where it properly conveys the intent, and I've
proven the code gen works, and is good. Now I figure I should get
everyone to shoot it down before I commit to the tedious work filling in
all the remaining blanks.
(Note: I've only written code against GDC as yet, since DMD's SSE only
supports x64, and x64 is not supported in Windows)
https://github.com/TurkeyMan/**phobos/blob/master/std/simd.d<https://github.com/TurkeyMan/phobos/blob/master/std/simd.d>
The code might surprise a lot of people... so I'll give a few words
about the approach.
The key goal here is to provide the lowest level USEFUL set of
functions, all the basic functions that people actually use in their
algorithms, without requiring them to understand the quirks of various
platforms vector hardware.
Different SIMD hardware tends to have very different shuffling,
load/store, component addressing, support for more/less of the primitive
maths operations, etc.
This library, which is the lowest level library I expect programmers
would ever want to use in their apps, should provide that API at the
lowest useful level.
First criticism I expect is for many to insist on a class-style vector
library, which I personally think has no place as a low level, portable
API.
Everyone has a different idea of what the perfect vector lib should look
like, and it tends to change significantly with respect to its
application.
I feel this flat API is easier to implement, maintain, and understand,
and I expect the most common use of this lib will be in the back end of
peoples own vector/matrix/linear algebra libs that suit their apps.
My key concern is with my function names... should I be worried about
name collisions in such a low level lib? I already shadow a lot of
standard float functions...
I prefer them abbreviated in this (fairly standard) way, keeps lines of
code short and compact. It should be particularly familiar to anyone who
has written shaders and such.
Opinions? Shall I continue as planned?

First criticism I expect is for many to insist on a class-style
vector
library, which I personally think has no place as a low level,
portable API.
Everyone has a different idea of what the perfect vector lib
should look
like, and it tends to change significantly with respect to its
application.

I think it would be useful, especially to newcomers who are
unfamiliar with D's lib terrain, to have an officially supported
"utils" library for these higher-level structures.
core // to the metal
std // low-level but useful
util // get the job done

Looks good to me so far ;-)
First criticism I expect is for many to insist on a class-style vector

library, which I personally think has no place as a low level, portable
API.
Everyone has a different idea of what the perfect vector lib should look
like, and it tends to change significantly with respect to its
application.

I think it would be useful, especially to newcomers who are unfamiliar
with D's lib terrain, to have an officially supported "utils" library for
these higher-level structures.
core // to the metal
std // low-level but useful
util // get the job done

Precisely my thoughts too. Something like 'util' may produce comprehensive,
very generic, standard constructs, but makes no guarantees that they are
efficient, or the best possible implementation for your application/context.

First criticism I expect is for many to insist on a class-style
vector
library, which I personally think has no place as a low level,
portable API.
Everyone has a different idea of what the perfect vector lib
should look
like, and it tends to change significantly with respect to its
application.
I feel this flat API is easier to implement, maintain, and
understand, and
I expect the most common use of this lib will be in the back
end of peoples
own vector/matrix/linear algebra libs that suit their apps.
My key concern is with my function names... should I be worried
about name
collisions in such a low level lib? I already shadow a lot of
standard
float functions...
I prefer them abbreviated in this (fairly standard) way, keeps
lines of
code short and compact. It should be particularly familiar to
anyone who
has written shaders and such.

I prefer the flat API and short names too.

Opinions? Shall I continue as planned?

Looks nice. Please do continue :)
You have only run this on a 32 bit machine, right? Cause I tried
to compile this simple example and got some errors about
converting ulong to int:
auto testfun(float4 a, float4 b)
{
return swizzle!("yxwz")(a);
}
It compiles if I do this changes:
566c566
< foreach(i; 0..N)
---

On Saturday, 4 February 2012 at 23:15:17 UTC, Manu wrote:
First criticism I expect is for many to insist on a class-style vector

library, which I personally think has no place as a low level, portable
API.
Everyone has a different idea of what the perfect vector lib should look
like, and it tends to change significantly with respect to its
application.
I feel this flat API is easier to implement, maintain, and understand, and
I expect the most common use of this lib will be in the back end of
peoples
own vector/matrix/linear algebra libs that suit their apps.
My key concern is with my function names... should I be worried about name
collisions in such a low level lib? I already shadow a lot of standard
float functions...
I prefer them abbreviated in this (fairly standard) way, keeps lines of
code short and compact. It should be particularly familiar to anyone who
has written shaders and such.

I prefer the flat API and short names too.
Opinions? Shall I continue as planned?

Looks nice. Please do continue :)
You have only run this on a 32 bit machine, right? Cause I tried to
compile this simple example and got some errors about converting ulong to
int:

True, I have only been working in x86 GDC so far, but I just wanted to get
feedback about my approach and API design at this point.
It seems there are no serious objections, I'll continue as is. I have an
ARM compiler too now, so I'll be implementing/testing against that as
reference also.

True, I have only been working in x86 GDC so far, but I just
wanted to get
feedback about my approach and API design at this point.
It seems there are no serious objections, I'll continue as is.

I have one proposal about API design of matrix operations. Maybe
there could be functions that would take row vectors as
parameters in addition to those that take matrix structs. That
way one could call matrix functions on data that isn't stored as
matrix structures without copying. So for example for the
transpose function there would also be a function that would be
used like this (a* are inputs and r* are outputs):
transpose(aX, aY, aZ, aW, rX, rY, rZ, rW);
Maybe those functions could be used to implement the functions
that take and return structs.
I also think that interleave and deinterleave operations would be
useful. For four element float vectors those can be implemented
with only one instruction at least for SSE (using unpcklps,
unpckhps and shufps) and NEON (using vuzp and vzip).

I have an
ARM compiler too now, so I'll be implementing/testing against
that as
reference also.

True, I have only been working in x86 GDC so far, but I just wanted to g=

et

feedback about my approach and API design at this point.
It seems there are no serious objections, I'll continue as is.

I have one proposal about API design of matrix operations. Maybe there co=

uld

be functions that would take row vectors as parameters in addition to tho=

se

that take matrix structs. That way one could call matrix functions on dat=

a

that isn't stored as matrix structures without copying. So for example fo=

r

the transpose function there would also be a function that would be used
like this (a* are inputs and r* are outputs):
transpose(aX, aY, aZ, aW, rX, rY, rZ, rW);
Maybe those functions could be used to implement the functions that take =

and

return structs.
I also think that interleave and deinterleave operations would be useful.
For four element float vectors those can be implemented with only one
instruction at least for SSE (using unpcklps, unpckhps and shufps) and =

=A0NEON

(using vuzp and vzip).

I have an
ARM compiler too now, so I'll be implementing/testing against that as
reference also.

True, I have only been working in x86 GDC so far, but I just wanted to get

feedback about my approach and API design at this point.
It seems there are no serious objections, I'll continue as is.

I have one proposal about API design of matrix operations. Maybe there
could be functions that would take row vectors as parameters in addition to
those that take matrix structs. That way one could call matrix functions on
data that isn't stored as matrix structures without copying. So for example
for the transpose function there would also be a function that would be
used like this (a* are inputs and r* are outputs):
transpose(aX, aY, aZ, aW, rX, rY, rZ, rW);

... the problem is, without multiple return values (come on, D should have
multiple return values!), how do you return the result? :)

Maybe those functions could be used to implement the functions that take
and return structs.

Yes... I've been pondering how to do this properly for ages actually.
That's the main reason I haven't fleshed out any matrix functions yet; I'm
still not at all sold on how to represent the matrices.
Ideally, there should not be any memory access. But even if they pass by
ref/pointer, as soon as the function is inlined, the memory access will
disappear, and it'll effectively generate the same code...
So the problem is not so much with respect to THIS API, but with respect to
the matrix calling convention in general...
I also think that interleave and deinterleave operations would be useful.

For four element float vectors those can be implemented with only one
instruction at least for SSE (using unpcklps, unpckhps and shufps) and
NEON (using vuzp and vzip).

Sure. I wasn't sure how useful they were in practise... I didn't want to
load it with countless silly permutation routines so I figured I'll add
them by request, or as they are proven useful in real world apps.
What would you typically do with the interleave functions at a high level?
Sure you don't just use it as a component behind a few actually useful
functions which should be exposed instead?
I have an

ARM compiler too now, so I'll be implementing/testing against that as
reference also.

Could you please tell me how did you get the ARM compiler to work?

I did not.. It was the work of another fine chap in the gdc newsgroup ;)

used like this (a* are inputs and r* are outputs):
transpose(aX, aY, aZ, aW, rX, rY, rZ, rW);

... the problem is, without multiple return values (come on, D
should have
multiple return values!), how do you return the result? :)

Maybe those functions could be used to implement the functions
that take
and return structs.

Yes... I've been pondering how to do this properly for ages
actually.
That's the main reason I haven't fleshed out any matrix
functions yet; I'm
still not at all sold on how to represent the matrices.
Ideally, there should not be any memory access. But even if
they pass by
ref/pointer, as soon as the function is inlined, the memory
access will
disappear, and it'll effectively generate the same code...

Sure. I wasn't sure how useful they were in practise... I
didn't want to
load it with countless silly permutation routines so I figured
I'll add
them by request, or as they are proven useful in real world
apps.
What would you typically do with the interleave functions at a
high level?
Sure you don't just use it as a component behind a few actually
useful
functions which should be exposed instead?

I think they would be useful when you work with arrays of structs
with two elements such as complex numbers. For example to
calculate a square of a complex array you could do:
for(size_t i=0; i < a.length; i += 2)
{
float4 first = a[i];
float4 second = a[i + 1];
float4 re = deinterleaveLow(first, second);
float4 im = deinterleaveHigh(first, second);
flaot4 re2 = re * re - im * im;
float4 im2 = re * im
im2 += im2;
a[i] = interleaveLow(re2, im2);
a[i + 1] = interleaveHigh(re2, im2); }
Interleave and interleave can also be useful when you want to
shuffle data in some custom way. You can't cover all possible
permutations of elements over multiple vectors in a library
(unless you do something like
A* search at compile time and generate code based on that - but
that would probably be way to slow), but you can expose at least
the capabilities that are common to most platforms, such as
interleave and deinterleave.