14.9 and higher OpenCL performance

I've tried dozens of tweaks to my Groestl implementation, and it refuses to get anywhere near as fast as pre-14.9 code. The pre-14.9 code is just butchered on 14.9 - a rewrite makes it better, but the performance is still dismal. Are the changes in 14.9+ permanent, meaning I have to not upgrade for the forseeable future?

Thanks for sharing the link. However, only kernel code is available there. A corresponding host code that calls the kernel is required to run and analyze it. I'm not familiar with the code. Can you please provide the corresponding host code? Please also mention your setup details.

Sorry, that code is for SGMiner 5 - an altcoin mining application. The host code is on GitHub, here: https://github.com/sgminer-dev/sgminer - the file shown should replace kernel/groestlcoin.cl in the source tree.

For the exact command line/settings, I might need the GPU it's going to run on.

As for my setup, I have almost every GCN card - well, all the chips, at least. 2x270X, 3x7950, 3x280X, 1x285, and 3x290X. Same performance drop on every one.

EDIT: If you want settings for all the cards I have, just let me know.

No, not in the same way. Litecoin's algorithm uses a scratchpad that cannot fit in LDS, so it goes in global memory - Groestl, at least these implementations, use rather large lookup tables, but they do fit in LDS.

Finally I got some free time and I thought I'd like to try myself optimizing this algo in asm.

Maybe I can beat OpenCL, or even if not, then learn new things.

But I need some help to start: Would you please send me a complete test_case for your kernel?

- the kernel source

- the global/local dimensions of the kernel

- all the kernel input parameters. It must be a case when it actually finds a GroestCoin hash.

I need your help as I'm too lame/lazy to set up a working C environment and the sg-miner is also a complicated one. I only wan't to fiddle with the kernel itself, and that's why I need a good test_case.