sin/cos are required to fulfill the LLVM requirements for OpenCL floating point accuracy. Native_* versions have no such restraints and compile down to a single hardware instruction. The trig functions are not single instruction functions in most cases.

Where did you got those numbers? I tried your code in SKA and get 178 Instruction clauses and it uses 11 GPRs. With native version it is 11 clauses and 3 GPRs. The number of reads are 2 and write is 1 in both cases as expected.

Where did you got those numbers? I tried your code in SKA and get 178 Instruction clauses and it uses 11 GPRs. With native version it is 11 clauses and 3 GPRs. The number of reads are 2 and write is 1 in both cases as expected.

Maybe you're compiling for a different GPU? I also see 62 GPRs and 10 fetches on a 5870, 63 GPRs and 8 fetches on a 6970.

Your CPU may be doing something similar when you ask it to compute a power, it's just not so blatantly obvious. On an Intel Core Duo, the two instructions that do the bulk of the job inside pow() take 165 clock ticks.

Originally posted by: himanshu.gautam So it appears we are getting similar number of instruction clauses, but variation in GPR usage. I checked with the internal SDK and get values similar to what i reported before.

I'm 100% sure that I'm using SDK 2.3 (downloaded from amd.com around end of January), and I don't see how I could be doing anything wrong, seeing how all I do is paste that code into Stream Kernel Analyzer. If I ask SKA to check for updates, it tells me that the my version is up to date. If you download the SDK from the official web site, you'll see the same thing. Maybe it was fixed since the last release.