Some trivial binary operations that never had an expression in C/C++, I'd
love consideration for an operator or some sort of intrinsic for these.
*Roll/Rotate:* I'm loving the '>>>' operator, but I could often really do
with a rotate operator useful in many situations... '>>|' perhaps...
something like that?
This is ugly: a = (a << x) | ((unsigned)a >> (sizeof(a)/8 - x)); ... and
I'm yet to see a compiler that will interpret that correctly.
Additionally, if a vector type is every added, a rotate operator will
become even more useful.
*Count leading/trailing zeroes:* I don't know of any even slightly recent
architecture that doesn't have opcodes to count loading/trailing zeroes,
although they do exist, so perhaps this is a little dubious. I'm sure this
could be emulated for such architectures, but it might be unreasonably slow
if used...
*Min/Max operators:* GCC has the lovely <? and >? operators... a <? b ==
min(a, b) .. Why this hasn't been adopted by all C compilers is beyond me.
Surely this couldn't be much trouble to add? Again, super useful in
vector/maths heavy code too.
*Predecated selection:* Float, vector, and often enough even int math can
really benefit from using hardware select opcodes to avoid loads/stores. In
C there is no way to express this short of vendor specific intrinsics again.
'a > b ? a : b' seems like a simple enough expression for the compiler to
detect potential for a predecated select opcode (but in my experience, it
NEVER does), however, when considering vector types, the logic isn't so
clear in that format. Since hardware vectors implement component-wise
selection, the logical nature of the ?: operator doesn't really make sense.
This could easily be considered an expansion of min/max... 'a <? b', 'a >?
b', 'a ==? b', 'a !=? b', etc. seems pretty natural if you're happy to
accept GCC's '<?' operators, and give the code generator the opportunity to
implement these things using hardware support.
C is terrible at expressing these concepts, resulting in
architecture/compiler specific intrinsics for each of them. Every time I've
ever written a maths library, or even just optimised some maths heavy
routines, these things come up, and I end up with code full of
architecture/platform/compiler ifdef's. I'd like to think they should be
standardised intrinsic features of the language (not implemented in the
standard library), so the code generator/back end has the most information
to generate proper code...
Cheers guys
- Manu

Manu:
> *Roll/Rotate:* I'm loving the '>>>' operator, but I could often really do
> with a rotate operator useful in many situations... '>>|' perhaps...
> something like that?
> This is ugly: a = (a << x) | ((unsigned)a >> (sizeof(a)/8 - x));
I have asked for a rotate intrinsic in Phobos, but Walter has added a rewrite rule instead, that turns D code to a rot.
Personal experience has shown me that it's easy to write the operation in a slightly different way (like with signed instead of unsigned values) that causes a missed optimization. So I prefer still something specific, like a Phobos intrinsic, to explicitly ask for this operation to every present and future D compiler, with no risk of mistakes.
> *Min/Max operators:* GCC has the lovely <? and >? operators... a <? b ==
> min(a, b) .. Why this hasn't been adopted by all C compilers is beyond me.
> Surely this couldn't be much trouble to add? Again, super useful in
> vector/maths heavy code too.
This is cute. Surely max/min is a common operation to do, but often I have to find a max or min of a collection, where I think this operator can't be used. I don't think this operator is necessary, and it makes D code a bit less readable for people that don't know D.
> *Predecated selection:* Float, vector, and often enough even int math can
> really benefit from using hardware select opcodes to avoid loads/stores. In
> C there is no way to express this short of vendor specific intrinsics again.
I don't understand what you are asking here. Please show an example.
There is an enhancement request that asks to support vector operations like this too (some CPUs support something like this in hardware):
int[] a = [1,2,3,4];
int[] b = [4,3,2,1];
auto c = a[] > b[];
assert(c == [false, false, true, true]);
Are operations like this what you are asking for here?
Bye,
bearophile

On Mon, 17 Oct 2011 16:53:42 -0400, Manu <turkeyman@gmail.com> wrote:
[snip]
> *Count leading/trailing zeroes:* I don't know of any even slightly recent
> architecture that doesn't have opcodes to count loading/trailing zeroes,
> although they do exist, so perhaps this is a little dubious. I'm sure this
> could be emulated for such architectures, but it might be unreasonably slow
> if used...
D has this: check out std.intrinsic's bsr and bsl.

On 10/17/2011 4:45 PM, bearophile wrote:
> Manu:
>
>> *Roll/Rotate:* I'm loving the '>>>' operator, but I could often really do
>> with a rotate operator useful in many situations... '>>|' perhaps...
>> something like that? This is ugly: a = (a<< x) | ((unsigned)a>>
>> (sizeof(a)/8 - x));
>
> I have asked for a rotate intrinsic in Phobos, but Walter has added a rewrite
> rule instead, that turns D code to a rot. Personal experience has shown me
> that it's easy to write the operation in a slightly different way (like with
> signed instead of unsigned values) that causes a missed optimization. So I
> prefer still something specific, like a Phobos intrinsic, to explicitly ask
> for this operation to every present and future D compiler, with no risk of
> mistakes.
There's no need for a compiler intrinsic. Just write a function that does do the
optimization, and call it.
The signed versions "don't work" because a signed right shift is not the same
thing as an unsigned right shift.
For reference:
void test236()
{
uint a;
int shift;
a = 7;
shift = 1;
int r;
r = (a >> shift) | (a << (int.sizeof * 8 - shift));
assert(r == 0x8000_0003);
r = (a << shift) | (a >> (int.sizeof * 8 - shift));
assert(a == 7);
}

Walter Bright:
> There's no need for a compiler intrinsic. Just write a function that does do the
> optimization, and call it.
Right. Two functions like this are worth putting somewhere in Phobos.
> The signed versions "don't work" because a signed right shift is not the same
> thing as an unsigned right shift.
It was a mistake in my code.
Thank you, bye,
bearophile

On 18.10.2011 06:25, Robert Jacques wrote:
> On Mon, 17 Oct 2011 16:53:42 -0400, Manu <turkeyman@gmail.com> wrote:
> [snip]
>> *Count leading/trailing zeroes:* I don't know of any even slightly recent
>> architecture that doesn't have opcodes to count loading/trailing zeroes,
>> although they do exist, so perhaps this is a little dubious. I'm sure
>> this
>> could be emulated for such architectures, but it might be unreasonably
>> slow
>> if used...
>
> D has this: check out std.intrinsic's bsr and bsl.
You mean bsr and bsf.
Unfortunately, there are some big problems with them. What is bsr(0) ?

On 18 October 2011 12:12, Don <nospam@nospam.com> wrote:
> You mean bsr and bsf.
> Unfortunately, there are some big problems with them. What is bsr(0) ?
>
True ;) .. but that's why the API needs to be defined and standardised.
On PowerPC it returns 32 (or 64), and the x86 version returns 2 values, the
position, and also a bool telling you if it was zero or not (useful for loop
termination)
I think all hardware that I've seen is easy to factor into the win32
intrinsic api.

On 18 October 2011 05:11, kennytm <kennytm@gmail.com> wrote:
> FYI, g++ has deprecated these operators long time ago (since 4.0).
>
> http://gcc.gnu.org/onlinedocs/gcc-4.0.4/gcc/Deprecated-Features.html
>
Nooo! .. Removed in favour of the STL instead... well I for one thought they
were a great idea, but apparently trumped by the standards mob.
Doesn't mean they couldn't be considered for D though :)

On 18 October 2011 02:45, bearophile <bearophileHUGS@lycos.com> wrote:
> I have asked for a rotate intrinsic in Phobos, but Walter has added a
> rewrite rule instead, that turns D code to a rot.
> Personal experience has shown me that it's easy to write the operation in a
> slightly different way (like with signed instead of unsigned values) that
> causes a missed optimization. So I prefer still something specific, like a
> Phobos intrinsic, to explicitly ask for this operation to every present and
> future D compiler, with no risk of mistakes.
>
I agree, an intrinsic that guarantees compiler support, or even an
operator... ;)
> *Predecated selection:* Float, vector, and often enough even int math can
> > really benefit from using hardware select opcodes to avoid loads/stores.
> In
> > C there is no way to express this short of vendor specific intrinsics
> again.
>
> I don't understand what you are asking here. Please show an example.
>
> There is an enhancement request that asks to support vector operations like
> this too (some CPUs support something like this in hardware):
> int[] a = [1,2,3,4];
> int[] b = [4,3,2,1];
> auto c = a[] > b[];
> assert(c == [false, false, true, true]);
>
> Are operations like this what you are asking for here?
>
by predicated selection, I mean, code that will select from 2 values based
on some predicate... code that looks like this: float c = (some comparison)
? x : z; .. This has hardware support on many modern architectures to
perform it branch free, particularly important on PowerPC and other RISC
chips.
The vector equivalent depends on generating mask vectors from various
comparisons (essentially the same as the scalar versions, but it would be
nice to standardise that detail with a strict api).
Working something like this:
a = {1,2,3,4}
b = {4,3,2,1}
m = maskLessThan(a, b); -> m == { true, true, false, false }; (usually
expressed by integer 0 or -1)
c = select(m, a, b); -> c == {1, 2, 2, 1}
Now this is effectively identical to: float c = a < b ? a : b; but in SIMD,
but there's no nice expression in the language to do this. The details are
occasionally slightly different on different architectures, hence I'd like
to see a standard predecated selection API of some form, which will allow
use of hardware opcodes for float/int, and also mapping to SIMD cleanly.
This might possibly branch off into another topic about SIMD support in D,
which appears to be basically non-existent.
One of the real problems is lack of definition of SIMD types and behaviours.
Also, this construct requires the concept of a mask vector (in essence a
SIMD bool), which should be a concept factored into the SIMD design...
On a side note, I've seen murmurings of support for syntax like you
illustrate a few times (interpreting D arrays as candidates for hardware
SIMD usage). While that MIGHT be a nice optimisation in isolated cases, I
have very serious concerns about standardising that as the language mechanic
for dealing with SIMD data types.
I wrote a couple of emails about that in the past though.