This is my optimized Groestlcoin / Diamond and similar opencl kernel (groestl + groestl algorythm, not myriad-groestl which is groestl + sha, see the top of this post for the latter).It is based on the sph version originally available on sph-sgminer but is now totally rewritten.It should be compatible with all sph-sgminer versions and derivates.

- Stop the miner- Replace groestlcoin.cl, diamond.cl and/or the kernel you want to use with this one (it's inside the "kernel" folder)- Remove all the .bin files (in the main folder)- Set worksize to 256 only (-w 256)- Run and enjoy!

TWEAKING

Set intensity from 20 to 22. Thread concurrency and all the other parameters are useless.This kernel doesn't make use of gpu ram, so set the ram clock to THE MINIMUM POSSIBLE VALUE; for example 150 MHz for R9 290.Now play with the core clock until you find the highest stable value (probably between 1100 and 1200 for the R9 290).

COMPATIBILITY

Tested working stable on R9 290, 280x and 7950. Should work on any recent amd gpu but performance is not guaranted to be optimal.I doesn't work with cryptohunger optimized pool: use the conventional port or another pool. Also do not replace the optimized kernel of grs-sgminer but the normal one.

TROUBLESHOOTING

Try the following:- Sure you set worksize to 256?- Replace the generated .bin file with this one (64 bit, r9 280(x) and 290(x) only): LINK EXPIRED (diamondHawaiiw256l8.bin), see below for a newer binary file- Lower the intensity- Lower the core speed (are you sure you put the ram clock to the lowest possible value?)- Since it uses more power, it could be a cooling issue too: check the gpu temperature

DONATIONS

This work took me months of coding and testing and unslept nights; please show your appreciation (you are making more money by using it!) by donating to:BTC: 1H7qC5uHuGX2d5s9Kuw3k7Wm7xMQzL16SN

The first gpu miner for groestlcoin and similar was sph-sgminer by phm. Optimizing the original implementation was trivial (almost 3x the speed could be achived!), so probably there are tens of optimized versions around, many of which have been kept private: mining groestlcoin and similar was always unfair for most people, at least for non-devs.Hopefully this kernel will end this and should also level the field between amd and nvidia.I believe my version is faster than many of the other kernels because of the time I dedicated to it and the thousands of tests I did.

FINAL ADVICES

I suggest to keep "good binaries": make a backup of the fastest .bin files you have, so you can recover them in case of driver problems.Also this will enable you to get 1 o 2 percent more hashrate because of compiler variance (try removing the bin and running 3/4 times to see the variance in action).Or use the provided bin file (see the OP) which should be a good one.

I've experienced lower power usage with catalyst 14.9 compared to 14.6 beta (a bit less compared to 13 but still better). Speaking of optimization, this should be kept in mind: buy a power meter for your miner(s)!But it looks like the compiler included in 14.9 drivers produces binaries which run considerably slower than older releases. If you are on 14.9, use the provided bin file instead of the kernel source in .cl format.

Have you tried other groestl algorithms? Like byte slicing or bit slicing?

Did you also have strange situations where better looking code actually ran slower?

PS: I wish you had opened the code earlier, Groestlcoin even had a bounty for an optimized miner.

Thankssrcxxx

I did a lot of optimizations which looked smart but lead to slower code. You can't imagine how many. Some of them I though of while in bed and made me not sleep until I could try them. Then the delusion :-DI believe it's because of the optimizations the compiler does but most of all about local memory and cache access.If you optimize some instructions but leading to less parallelism in memory access, you'll get slower code.That's typical for groestl because the speed is limited mostly by the memory (otherwise it would be faster like keccak, blake etc.).I know there was a bounty: my code was unfinished and, at the time, it was the only way I could mine without loosing money.

Have you tried other groestl algorithms? Like byte slicing or bit slicing?

Did you also have strange situations where better looking code actually ran slower?

PS: I wish you had opened the code earlier, Groestlcoin even had a bounty for an optimized miner.

Thankssrcxxx

I did a lot of optimizations which looked smart but lead to slower code. You can't imagine how many. Some of them I though of while in bed and made me not sleep until I could try them. Then the delusion :-DI believe it's because of the optimizations the compiler does but most of all about local memory and cache access.If you optimize some instructions but leading to less parallelism in memory access, you'll get slower code.That's typical for groestl because the speed is limited mostly by the memory (otherwise it would be faster like keccak, blake etc.).I know there was a bounty: my code was unfinished and, at the time, it was the only way I could mine without loosing money.

I know. I actually think that the compiler is not that clever and that's why sometimes worse code runs faster.Also, I looked at ASM and some stuff there is just plain not optimal. Perhaps it'll be improved in future versions of AMD drivers.

Also, most ASM code only uses .xy from a register. I tried making it work on ulong2 or ulong8 - only slower.