Seems good, but it brings no gain for me ... weird. Will look into this and perhaps re-use your idea, if I may .Can only think of a good compiler optimization...

It is not obvious that this will gain anything. On the contrary, unless the compiler optimize it away, it makes the second and third instruction dependant on the first. This is bad on a GPU which issues 4 or 5 instructions in parallel every clock cycle.

Well, it worked for me, might be because I only have a slow 4670. However, the same change in sharound2 decreases performance.

Another thing that seems to run a little bit faster on cards without BFI_INT:

Seems good, but it brings no gain for me ... weird. Will look into this and perhaps re-use your idea, if I may .Can only think of a good compiler optimization...

It is not obvious that this will gain anything. On the contrary, unless the compiler optimize it away, it makes the second and third instruction dependant on the first. This is bad on a GPU which issues 4 or 5 instructions in parallel every clock cycle.

I wish I could use the AMD APP KernelAnalyzer, but it currently won´t work, so I rely on the pure MHash/sec numbers that Phoenix gives me :-/. Anyway, I perhaps will try to follow your hint. If it´s faster to do a calculation twice but be independend of eachother then okay .

My work is not over .

Thanks for the 3 people who sent a donation so far! It´s a nice motivation. Did anyone repost this in the Mining Software forum? I am not allowed to post there ^^.

Seems good, but it brings no gain for me ... weird. Will look into this and perhaps re-use your idea, if I may .Can only think of a good compiler optimization...

It is not obvious that this will gain anything. On the contrary, unless the compiler optimize it away, it makes the second and third instruction dependant on the first. This is bad on a GPU which issues 4 or 5 instructions in parallel every clock cycle.

I wish I could use the AMD APP KernelAnalyzer, but it currently won´t work, so I rely on the pure MHash/sec numbers that Phoenix gives me :-/. Anyway, I perhaps will try to follow your hint. If it´s faster to do a calculation twice but be independend of eachother then okay .

My work is not over .

Thanks for the 3 people who sent a donation so far! It´s a nice motivation. Did anyone repost this in the Mining Software forum? I am not allowed to post there ^^.

Dia

newbie rules are pretty hard here.you do have to spend more than 4 hours play with this forum to become able post anywhere

Updated first post with changelog and performance info. This one should be a bit faster on 69XX cards than the original phatk, faster than all other phatk versions I did on 58XX and faster on non BFI_INT cards because of a change user 1MLyg5WVFSMifFjkrZiyGW2nw suggested!

Updated first post with changelog and performance info. This one should be a bit faster on 69XX cards than the original phatk, faster than all other phatk versions I did on 58XX and faster on non BFI_INT cards because of a change user 1MLyg5WVFSMifFjkrZiyGW2nw suggested!

Try, have fun, comment and donate .

Thanks,Dia

Thanks, best version yet Still not reached the 40 MHash/sec the wiki says my card could do

Did you notice that Ma(x, y, z) is defined exactly the same now whether BFI_INT is enabled or not? Seems more elegant to me if moved out of the #ifdef. Also I tried to replace some #define's with functions, guessing that it would make it easier for a somewhat smart compiler to find repeatedly used terms and put them into registers. No performance improvement, but didn't hurt it either.

Thanks, best version yet Still not reached the 40 MHash/sec the wiki says my card could do

Did you notice that Ma(x, y, z) is defined exactly the same now whether BFI_INT is enabled or not? Seems more elegant to me if moved out of the #ifdef. Also I tried to replace some #define's with functions, guessing that it would make it easier for a somewhat smart compiler to find repeatedly used terms and put them into registers. No performance improvement, but didn't hurt it either.

Thank YOU another nice hint, even if it not boosts, the code gets cleaner. I'm not sure about the #define as functions. Could you post an example? My problem is, that most variables are defined and declared inside the kernel function. So a function for sharound() for example needs Vals[] and others as passed parameters (copy or pointer).

Thank YOU another nice hint, even if it not boosts, the code gets cleaner. I'm not sure about the #define as functions. Could you post an example? My problem is, that most variables are defined and declared inside the kernel function. So a function for sharound() for example needs Vals[] and others as passed parameters (copy or pointer).