-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256
Benoit Jacob schrieb:
> Yes I agree, I too prefer the 1st solution: own specialization of
> complex<double>. Actually I hadn't thought of that possibility.
>
> Unfortunately, at least here in gcc 4.3, <complex> does specialize
> complex for float and double. Not just as an external, but the actual
> specialization is in the header. Which prevents us from doing our own.
Yes, <complex> is meant to be treated special by the compiler (most?
any? aren't doing it "yet" though). It's a bit similar to the <array>
module that's guaranteed to be unaliased...
The newest C standard (C99 IIRC) is a bit clearer though. It comes with
the complex and alias key words and doesn't rely on a "naming convention".
> Moreover, comments refer to what seem to be sections in the C++ spec,
> so this is probably standard.
>
> So at this point I want to say: OK so the c++ standard expects the std
> lib implementors to fully take care of optimizing complex for float
> and double, fine! So the next thing to do is to check the assembly
> with some test program: enable SSE2 and check that the compiler really
> produces vectorized code. An addition of complex<double> should be
> vectorized for example. Looking at the <complex> header, there doesn't
> seem to be any explicit vectorization, but perhaps they know that the
> compiler does a good job here, or perhaps i don't understand what
> these __real__, __imag__ pseudo keywords mean.
>
Apart from the <complex> STL lib, you could try the C keyword complex.
It's not in the C++ Standard yet, but people are argumenting for it to
keep the compatability between C and C++. Anyway most modern C++
compilers are supporting it...
If it comes to our own implementation of complex numbers it might be an
idea to seperate the real and imaginary parts, i.e. avoid to store them
in the same SSE register. SSE was for a long time very bad at using
"horizontal" comands, i.e. commands that use two parts of one register
together - something that's needed for an complex multiplication.
For an FFT I'm *guessing* that using a vector for all real parts and an
other vector for the imaginary numbers can be faster.
CU,
Christian
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
iEYEAREIAAYFAkk1cVQACgkQoWM1JLkHou05AwCgkk/SzEzoEJ02EQ6F2gxK3gJN
MPgAnjfnuOmNkaNkTPlLQLpOyfVyXEFa
=Rt8a
-----END PGP SIGNATURE-----
---