I'm pretty sure the GPU does not report back as soon as it finds a share ...If it did that then it would have to restart from that point each time it finds a share.

From my understanding of the process, a nonce range is setup and run and then the GPU CL code returns the list of shares found.(re: the --worksize parameter)

When that job is complete, the miner program then runs another nonce range etc until the full nonce range is complete.Then it will do the next queued result in the same way.

So while the CPU is waiting for the result of it's nonce range request, it can receive an LP.After this happens, the current nonce range process will return it's result.(of course after almost every LP this will happen since there will always be a nonce range in the middle of being processed)This result (if it has shares) would be invalid/stale for a normal Bitcoin LP.However with merged mining, it will be effectively be deemed invalid/stale each time an NMC LP appears also even though it would be valid without the extra requirement of merged mining.

Pool: https://kano.is Here on Bitcointalk: Forum BTC: 1KanoPb8cKYqNrswjaA8cRDk4FAS9eDMLUFreeNode IRC: irc.freenode.net channel #kano.isMajority developer of the ckpool codeHelp keep Bitcoin secure by mining on pools with full block verification on all blocks - and NO empty blocks!

I can't think of a scenario where a longpoll should make a miner wait for work.

Are you telling me a pool can universally throw out enough work for every single miner working for it in the microsecond after sending out a longpoll at a rate fast enough to guarantee their GPUs don't go idle? Let's close the discussion before I get more annoyed.

Well, I think pools should continue to accept the old work until they've finished sending out the new, whenever possible... Eligius continues to accept shares against old work so long as they're possible to submit to the Bitcoin chain (ie, same prevblock)-- but the missing transactions that were in the longpolled work will get delayed by miners who don't update in a timely manner.

I think responding to responding to LongPolls makes sense given most everything I've read about pools on this thread and others. Especially shads point about the alternative (non-new-block non-merged-mining) reasons for a new LongPoll. That said, I am curious about Kano's last point, mixed with something I believe I know about cgminer, and DeathandTaxes point that stales don't hurt while good shares help. I believe I know cgminer discards work without being told to once it gets to a certain age already in order to help lower stales, and I believe I know pools re-request unsolved work after some time period (which is why cgminer discards the aged work); assuming gpu mining works the way Kano described, wouldn't it be possible to submit that last round of work from the gpu even while discarding everything else in the queue for that pool, and assuming the point about stale shares not mattering is valid, wouldn't it make sense to do the same thing with the last solution from work being discarded for age if that isn't being done already? I mean, sure, the stale % would go up, but would the number of good shares go down, and would the gpu cycle and packet transfer matter when the cpu presumably has cycles to spare waiting on the gpu and the effect of that additional packet transfer on any modern miner's communication network is presumably miniscule? I had wondered this before, but never seen anything posted to make me think it was worth asking until this discussion.

However, the actual issue is what happens to any incomplete (but started) work occurring at the time of the LP?Discarding it is actually discarding work that is valid in all cases except on a Bitcoin new block LP.Submitting it and then getting a 'stale' response is just as bad.This is work you have done that would be valid if not for merged mining.i.e. back to what I said about it earlier on ...

There is no such thing as incomplete but started work. You don't make progress in mining.

Either you have a valid share or you don't.If you don't then nothing has been lost.If you do then submit it.

There is no concept of progress. Each hash is completely independent and on average take 1/300,000,000th of a second to complete.

However, the actual issue is what happens to any incomplete (but started) work occurring at the time of the LP?Discarding it is actually discarding work that is valid in all cases except on a Bitcoin new block LP.Submitting it and then getting a 'stale' response is just as bad.This is work you have done that would be valid if not for merged mining.i.e. back to what I said about it earlier on ...

There is no such thing as incomplete but started work. You don't make progress in mining.

Either you have a valid share or you don't.If you don't then nothing has been lost.If you do then submit it.

There is no concept of progress. Each hash is completely independent and on average take 1/300,000,000th of a second to complete.

Again as I have ALREADY said above, each hash is NOT independent.The GPU does a set of hashes and returns the results for that set.That set can contain one or more shares and those shares could be deemed invalid/stale by a pool under the circumstances I have said above and not invalid/stale by the same pool if it was not merged mining.You seem to have missed the point of how a GPU miner program actually does work.

Pool: https://kano.is Here on Bitcointalk: Forum BTC: 1KanoPb8cKYqNrswjaA8cRDk4FAS9eDMLUFreeNode IRC: irc.freenode.net channel #kano.isMajority developer of the ckpool codeHelp keep Bitcoin secure by mining on pools with full block verification on all blocks - and NO empty blocks!

ok so presumably a GPU has space for N hashes to be done concurrently in any one iteration. How many iterations are done before results are returned? If it's only one which equates to N hashes the time is minimal and the loss if any is probably below the threshold or reasonable measurement. If you're saying it does many iterations of these N hashes in one batch then presumably there is a way to cancel them part way through if new data needs to be worked on? If there's a way to cancel them there should be a way to interrupt them and get whatever results have been accumulate in that time. If there's no way to interrupt and get results-so-far then I'd respectfully suggest the code is broken.

However, the actual issue is what happens to any incomplete (but started) work occurring at the time of the LP?Discarding it is actually discarding work that is valid in all cases except on a Bitcoin new block LP.Submitting it and then getting a 'stale' response is just as bad.This is work you have done that would be valid if not for merged mining.i.e. back to what I said about it earlier on ...

There is no such thing as incomplete but started work. You don't make progress in mining.

Either you have a valid share or you don't.If you don't then nothing has been lost.If you do then submit it.

There is no concept of progress. Each hash is completely independent and on average take 1/300,000,000th of a second to complete.

Again as I have ALREADY said above, each hash is NOT independent.The GPU does a set of hashes and returns the results for that set.That set can contain one or more shares and those shares could be deemed invalid/stale by a pool under the circumstances I have said above and not invalid/stale by the same pool if it was not merged mining.You seem to have missed the point of how a GPU miner program actually does work.

No you are just wrong. While there can be more than one share per nonce range the miner submits shares as it discovers them so at any point in time there is no such thing as incomplete work. You are 100% wrong if you think a miner holds onto shares before submitting them. Even without merged mining that would be a flawed implementation because at any point a block could be found and then the shares not submitted are stale.

A miner doesn't hold onto shares as there is no value to do so. At any point in time there is no such thing as incomplete work. The next has is completely independent of prior work.

No you are just wrong. While there can be more than one share per nonce range the miner submits shares as it discovers them so at any point in time there is no such thing as incomplete work. You are 100% wrong if you think a miner holds onto shares before submitting them. Even without merged mining that would be a flawed implementation because at any point a block could be found and then the shares not submitted are stale.

A miner doesn't hold onto shares as there is no value to do so. At any point in time there is no such thing as incomplete work. The next has is completely independent of prior work.

In just reading Kano's posts, I tend to expect them to be wrong, and keep my mouth shut because I don't want to instigate a flame war or argue about them. However, the original post about work being "held" does make sense. If it is true, it isn't being held by the miner, it is being held by the GPU until the work submitted to it (as a group of whatever) is complete. If the GPU holds it and the miner doesn't know it's there, it can't submit it. You have an option on worksize, and it is more efficient to run a different worksize on one GPU vs another. Presumably because it consumes cycles to deliver work to the GPU and accept completion from the GPU. You could call this broken, but if you can't get or cancel the half-completed block of work from the GPU, that doesn't imply a coding problem, as hardware can also have constraints and GPUs weren't designed specifically to mine. Anyway, suppose for a minute that you actually can't get or cancel the half completed block of work. Perhaps you could have a worksize of one to resolve this problem, but your efficiency would drop so bad from all the extra cycles that you're far better off getting a group of work back that ends up being stale every now and again than losing those cycles with each piece of work, so why would you want to do that? I don't have any clue how all of this stuff actually works and have never really dealt with code, but this is one of few arguments I have seen that makes any sense (although it doesn't matter, as CGMiner supporting merged mining or properly supporting longpolling [whatever you want to call the behavior] has absolutely nothing to do with how a pool behaves and any pool that would completely reject a share that is still valid for the current blockchain is certainly broken and would still mean that CGMiner SHOULD accept the longpoll).

In just reading Kano's posts, I tend to expect them to be wrong, and keep my mouth shut because I don't want to instigate a flame war or argue about them. However, the original post about work being "held" does make sense. If it is true, it isn't being held by the miner, it is being held by the GPU until the work submitted to it (as a group of whatever) is complete.

While that is true the number of hashes performed before GPU returns results is relatively tiny. It is based on aggression value (or intensity for cgminer). Even at max aggression this is a fraction of a second, at most a tiny fraction of a share (in expected value). That combined with the fact long polls occur relatively infrequently this is a rounding error in performance.

So if that is what he means he is "correct" but the real world performance is minimal. I also have to check the source code of the kernel I believe (but need to verify) even when GPU continues to run nonce range it returns shares as they are found via callback. If that is true then there is no performance loss not even negligible amounts.

At high intensity levels, the time spent stuck in GPU code is in the order of SECONDS, not microseconds. The faster the GPU, the shorter it is, but at intensity 9 or 10 a 200Mhash card could actually be working for more than 5 seconds on each iteration into the GPU. It takes less time when there is only one thread per GPU, but then the hash rate drops off slightly. And NO there is NOT a way to interrupt a GPU once it has started working on the openCL code. While the worksize is somewhere between 64 and 256, the actual requested work every time the GPU is loaded is up to 2^20 iterations (at intensity 10). There is no way to interrupt it. It is not like doing something on a CPU. Faster cards won't take long to return even at high intensity levels, but basically, any shares discovered during this time in the GPU do NOT get returned until the GPU has finished its 2^20 iterations. That's just the way opencl kernel code works. The GPU takes its work and runs off and does it independently of anything else going on in your PC and then only returns answers once it's done. So there is work "wasted" here if it starts just before a longpoll, goes out for say 5 seconds and finds a share in that time. It is then obliged to discard it since cgminer now says that work is no longer valid for the current block of work unless you enable the --submit-stale option.

At high intensity levels, the time spent stuck in GPU code is in the order of SECONDS, not microseconds. The faster the GPU, the shorter it is, but at intensity 9 or 10 a 200Mhash card could actually be working for more than 5 seconds on each iteration into the GPU. It takes less time when there is only one thread per GPU, but then the hash rate drops off slightly. And NO there is NOT a way to interrupt a GPU once it has started working on the openCL code. While the worksize is somewhere between 64 and 256, the actual requested work every time the GPU is loaded is up to 2^20 iterations (at intensity 10). There is no way to interrupt it. It is not like doing something on a CPU. Faster cards won't take long to return even at high intensity levels, but basically, any shares discovered during this time in the GPU do NOT get returned until the GPU has finished its 2^20 iterations. That's just the way opencl kernel code works. The GPU takes its work and runs off and does it independently of anything else going on in your PC and then only returns answers once it's done. So there is work "wasted" here if it starts just before a longpoll, goes out for say 5 seconds and finds a share in that time. It is then obliged to discard it since cgminer now says that work is no longer valid for the current block of work unless you enable the --submit-stale option.

2^20 iterations is 1048576 hashes right? If we consider the upper and lower bound of modern GPU to be 100MH/s to 500MH/s that is 0.002s to 0.01s.

Not sure how a GPU can take full seconds to finish. Wouldn't that cause system instabilities? I mean the GPU is unusable for other tasks while OpenCL kernel is running.

2^20/2^32 = 0.024% Thus each interrupted "cycle" reduces EV (expected value) by 0.00024 shares.* *Granted each individual iteration will either be 1 share lost or 0 shares lost but the EV is still a fractional share.

OK, that covers the obvious problems (none of which are there). The next step would be:./cgminer .. options to connect to a pool ... --verbose --text-only --shares 1 > debug.log 2>&1, pastebin it once it complete and wait for someone with more depth to figure out what's going on..

At high intensity levels, the time spent stuck in GPU code is in the order of SECONDS, not microseconds. The faster the GPU, the shorter it is, but at intensity 9 or 10 a 200Mhash card could actually be working for more than 5 seconds on each iteration into the GPU. It takes less time when there is only one thread per GPU, but then the hash rate drops off slightly. And NO there is NOT a way to interrupt a GPU once it has started working on the openCL code. While the worksize is somewhere between 64 and 256, the actual requested work every time the GPU is loaded is up to 2^20 iterations (at intensity 10). There is no way to interrupt it. It is not like doing something on a CPU. Faster cards won't take long to return even at high intensity levels, but basically, any shares discovered during this time in the GPU do NOT get returned until the GPU has finished its 2^20 iterations. That's just the way opencl kernel code works. The GPU takes its work and runs off and does it independently of anything else going on in your PC and then only returns answers once it's done. So there is work "wasted" here if it starts just before a longpoll, goes out for say 5 seconds and finds a share in that time. It is then obliged to discard it since cgminer now says that work is no longer valid for the current block of work unless you enable the --submit-stale option.

2^20 iterations is 1048576 hashes right? If we consider the upper and lower bound of modern GPU to be 100MH/s to 500MH/s that is 0.002s to 0.01s.

Not sure how a GPU can take full seconds to finish. Wouldn't that cause system instabilities? I mean the GPU is unusable for other tasks while OpenCL kernel is running.

2^20/2^32 = 0.024% Thus each interrupted "cycle" reduces EV (expected value) by 0.00024 shares.* *Granted each individual iteration will either be 1 share lost or 0 shares lost but the EV is still a fractional share.

Per GPU 'thread' ...

I can understand people telling me I'm wrong ... what would I know

But you gotta realise that if you are talking to the person who wrote the program and you get a different answer - you've made a mistake.

Pool: https://kano.is Here on Bitcointalk: Forum BTC: 1KanoPb8cKYqNrswjaA8cRDk4FAS9eDMLUFreeNode IRC: irc.freenode.net channel #kano.isMajority developer of the ckpool codeHelp keep Bitcoin secure by mining on pools with full block verification on all blocks - and NO empty blocks!