You can do that by running export GPU_DUMP_DEVICE_KERNEL=3 prior to running hashkill (you need to be in a writable directory like e.g /tmp).

Then after say 30 seconds, stop execution (ctrl-c) and look for a file named bitcoin_<GPUmodel>.isa (e.g bitcoin_Cypress.isa). Please paste this file contents so that I have a look at it.

P.S. you would need ~ 5-10 seconds until speed peaks at maximum, it usually starts at lower speed and gradually increases. As for switches, you might try -G 3 and/or -D and see if it affects performance positively.

P.S 2: also please do not run the 32-bit version on a 64-bit system: it tends to be way slower. And (again) use SDK 2.3 or newer.

Yes, that's one of the bugs I have collected thanks to people that tested the alpha (related to an integer overflow). Another one found is related to missing deinitialization of certain curl handles that creates big problems after some time spent in mining. Another problem was related to improper BFI_INT replacement on 69xx cards (fixed now). Finally, the 69xx codepath is not optimal and I am now currently working on a separate vliw4 codepath that is best optimized for 69xx devices. Sorry for those, but your input was very helpful for me to identify and fix those issues. A new testing release will be ready soon with those problems resolved.

Another thing is that we're walking on the verge with those uint4 vectors...on my 6870 I'm getting 41 GPR usage currently. If that rises to 42 for some reason, performance degrades disastrously as the number of wavefronts/cu drops. I still need to find a way to reduce the GPR usage - cause on some other cards, the compiler is unable to generate code that keeps to 41GPRs thus generating slow-performing code. Since I am doing that by carefully reordering stuff, it's a bit wacky and not reliable at the moment...still need some work on that.

Works great @gat3way thank you for this, it improved my performance greatly from 220 mhash on Diablo to 267 mhash on your code! i did try your code from a few days ago same version number though and after a few hours it would loose the connection and just keep retrying, restart solved the problem, lets see if this new one lasts longer.

PS on a 6870, just over clocked to 950 and now at 286 Mhash... Great Stuff!

Just wait, people, there are still lots of bugs I am working on A new release will be done in a couple of days, hopefully fixing them all. The reconnect issue is due to missing deinitialization of a curl handle and this will definitely be resolved. We still have problems with 6990 and this afternoon I had to rewrite the whole kernel (replacing uint4 with interlaced uint2+uint) to get that GPR thing working reliable on all VLIW5 cards.

* Progress indicator finally fixed * Kernel reworked - there are separate codepaths, one for VLIW5 (interlaced uint2+uint to get best utilization) and another for VLIW4 architectures. Additional optimizations implemented.* Added -D command-line option. This tends to increase speed at the cost of reduced desktop responsiveness (kinda like Phoenix AGGRESSION parameter)* Additional marginal speedup can be achieved by using -G 3 option at the command line (or even -G4) - but that requires more memory and faster, multicore CPU* the curl handles leak was fixed - no more "connection failed after half an hour of work" issues.

The code changes are confirmed to be incompatible with ATI Stream SDK 2.1 and 2.2. Please _DO NOT_ use older than 2.3 versions.

Tried it out overnight on deepbit and while it was faster it also ended up with roughly 10% stale shares. Not sure the long polling is working properly, is it supposed to give any indication when it gets a new block notification?