You are here

Vector dependencies

Vector dependencies

I'm trying to squeeze the last bit of optimization out of a program using Intel C++ 10.1 (because with later versions I'm getting slower code - I'll look into that later).

When looking at the vectorization reports, I noticed 2 things I hadn't expected, and I wonder if they can be solved (without rewriting lots of code - total code base is over 2 MB and I'm working on it alone). I've tried to google them but didn't find any useful answers.

While I know that there's an _mm_max_ SIMD instruction. Problem might be the definition of max, I'm using:#define max(a,b) (((a)>(b)) ? (a) : (b)) The compiler might see this as an if instruction if it's unable to optimize everything out. Is there a better definition for max that doesn't cause the compiler to see dependencies where there are none?

>>...I'm trying to squeeze the last bit of optimization out of a program...

In your code:

fft_abs_sse2[2*cc] = max( fft_abs_sse2[2*cc], strength * m ); // (A)

there is no need in max macro and one if statement is actually needed instead of if-else statement "hidden" in max macro. Take a look in disassembler how max macro looks like unless it is already optimized by a C++ compiler, like Intel or Watcom.

Sorry for the delay in responding - I've been releasing a new software version and I've been extremely busy.

My goal wasn't to unroll a loop here (I would have expected the compiler to vectorize it), but I was processing audio encoded as left/right/left/right/... . It's definitely possible to rewrite the code and remove what looks like unrolling - but the current code better reflects the intention. I had expected the compiler to be able to figure this out (2*c and 2*c+1 with c increasing by 1 every step isn't rocket science), but apparently it doesn't.

About #define max vs. std::max (I didn't even know that that existed... oops): No difference in behavior. But I did notice something else: If I *only* put something like a = max(a, b) in a loop it does vectorize - however if I put more code around it it seems to be getting too complex for the compiler and it starts to complain about dependencies (a vs a).

O, and /Qansi-alias is set - it did have quite a big effect in the past (before I switched to using IPP and used my own hand-optimized SSE2 FFT implementation) but since I switched to IPP there's no difference anymore, so the only place that was affected (and has a noticeable effect on performance) was in my FFT code.