My version is faster but have some issues in solomining. If you mine on a pool you get 10-15% more coins with 1.5.74 / 1.5.78 in most algos. My private is up to 10% faster than the public(quark,lyra2v2).

I submit stale shares as default. If you have a slow connection you might get rejects. But my private farm is at 99.5% accepted.The nicehashminer is using my version 1.5.74 on maxwell cards because it is the fastest.

but your version is missing the most profitable algo, decred and ethereum, so it's pointless, when you add those?

I only get 445 Mh/s per 750 Ti. What am I missing? Do I need a certain driver or cuda toolkit or something?

Edit: apparently I get close to 500 if I only use one card but with 6 cards and a dualcore Pentium G3240 I only get abou 445 per card.I presume because the first round of blake is done on the CPU as per sp_ said.

I only get 445 Mh/s per 750 Ti. What am I missing? Do I need a certain driver or cuda toolkit or something?

Edit: apparently I get close to 500 if I only use one card but with 6 cards and a dualcore Pentium G3240 I only get abou 445 per card.I presume because the first round of blake is done on the CPU as per sp_ said.

It could be the CPU that is he bottleneck (you could check usage) but try increasing your pagefile size, assuming you're on Windows.

I only get 445 Mh/s per 750 Ti. What am I missing? Do I need a certain driver or cuda toolkit or something?

Edit: apparently I get close to 500 if I only use one card but with 6 cards and a dualcore Pentium G3240 I only get abou 445 per card.I presume because the first round of blake is done on the CPU as per sp_ said.

It could be the CPU that is he bottleneck (you could check usage) but try increasing your pagefile size, assuming you're on Windows.

I only get 445 Mh/s per 750 Ti. What am I missing? Do I need a certain driver or cuda toolkit or something?

Edit: apparently I get close to 500 if I only use one card but with 6 cards and a dualcore Pentium G3240 I only get abou 445 per card.I presume because the first round of blake is done on the CPU as per sp_ said.

It could be the CPU that is he bottleneck (you could check usage) but try increasing your pagefile size, assuming you're on Windows.

^ Also maybe worthwhile checking if multiple ccminer instances will help.So you could launch instance 1 with -d 0,1,2, and instance 2 with -d 3,4,5.In addition, ensure that cpu affinity is reserving specific cpu cores for instance one, and other cpu cores for instance 2.

I had to think about this a bit. ccminer should already create multiple CPU threads spread over all cores.This can be confirmed by using the -D option to enable debug output. Even so, with three threads per coreit could introduce scheduling latency. That combined with the small cache could easilly cause a 10% degradationin performance.

The algo probbaly factors into it as well. Does the degradation occur with other algos?

I have seen some odd performnance differences while testing cryptonight on cpuminer that I still don't understand.At first i thought it was due to some affinity tricks but I haven't found anything in the code to explain it.In short cryptonight performs radically differently on different CPUs/OSs. On a 6700K running Linux I get bestperformance CPU mining with 4 threads. More threads causes the total hashrate to drop to as low as half the 4 threadrate. Most other algos perform much better with more threads. The CPUs also run pretty cool on cryptonight suggestingthey are often stalled waiting for data (ie memory bound).

There shouldn't be any scheduling delays because the number of running threads is less than the available virtualcores. Any thread contention would occur during execution and be mitigated by hyperthreading.

That leaves cache performance as the most likley cause for both issues. If the total memory requirements of allthreads exceeds the available cache it will significantly affect cache performance. It's a step function as each cachelevel overflows.

Seems like going too cheap with a CPU for a mining rig isn't a good idea.

That leaves cache performance as the most likley cause for both issues. If the total memory requirements of allthreads exceeds the available cache it will significantly affect cache performance. It's a step function as each cachelevel overflows.

@joblo,Cryptonight on CPU is a particular case. There's a 2MB scratchpad per thread (or something else which proper name I don't recall). For whatever CPU you have, the ideal number of threads will always be cache-size/2. Most i7's have 8MB cache, so optimal threads = 4.

As far as the rest of the details that you posted, way over my head. /searching <nearest exit>

That leaves cache performance as the most likley cause for both issues. If the total memory requirements of allthreads exceeds the available cache it will significantly affect cache performance. It's a step function as each cachelevel overflows.

@joblo,Cryptonight on CPU is a particular case. There's a 2MB scratchpad per thread (or something else which proper name I don't recall). For whatever CPU you have, the ideal number of threads will always be cache-size/2. Most i7's have 8MB cache, so optimal threads = 4.

As far as the rest of the details that you posted, way over my head. /searching <nearest exit>

I don't agree with it being over your head, your comment was spot on and perfectly illustrates what I wassaying. As you add more threads the memory requirements increase and when it becomes larger that the cachesize performance drops noticibly in spite of any benefits provided by hyperthreading.

Perhaps a similar thing is occurring on a small CPU when GPU mining ETH with many threads.