Another volunteer for the Hall of Infamy:
http://setiathome.berkeley.edu/show_user.php?userid=9995210

One of his crunchers has a problem with its GPU and is returning "invalid" results by the bucket load.
OM has been sent.Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?

That is about a third of your returns are invalid, and still you stomp on without a care in the world. Why don't you turn your random generator off and save your fuel bill?Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?

I just sent Fred a private message, because he's wasting work by the hundreds, as he's using an AMD GPU with Catalysts 11.11, where 11.12 are the minimum required. So sad people don't ever look at their system, to check what it's doing.Jord

According to Giorgo of the Ancient Astronaut Theorists I do not help with tech questions via private message. He's right: please use the forums for that.

This probably had been discussed before but I couldn't find the explanation for it. So, could someone please direct me to the right answer?
What I see is that one of my PCs started accumulating invalid more than 30. So I looked at the WU and found many look like the following:

http://setiathome.berkeley.edu/workunit.php?wuid=1816027316

Where only two WUs were returned and mine labeled invalid while the other one said inconclusive.
How can this be possible? I thought matching at least 2 results validate the calculation. So, until they receive the third result shouldn't the both of currently returned WUs be labeled inconclusive?

This probably had been discussed before but I couldn't find the explanation for it. So, could someone please direct me to the right answer?
What I see is that one of my PCs started accumulating invalid more than 30. So I looked at the WU and found many look like the following:

http://setiathome.berkeley.edu/workunit.php?wuid=1816027316

Where only two WUs were returned and mine labeled invalid while the other one said inconclusive.
How can this be possible? I thought matching at least 2 results validate the calculation. So, until they receive the third result shouldn't the both of currently returned WUs be labeled inconclusive?

There's special code in the SaHv7 validator which instantly invalidates a result which ran full length but does not have a best_autocorr signal at the end. That is to be sure nobody is trying to process v7 tasks with a v6 application under anonymous platform.

Your task details for those invalidated cases indicate normal CUDA50 processing, so the suspicion is that somehow the uploaded result files have been truncated or corrupted. I've asked Eric Korpela to take a look at those if he has time.

[Edit:] reply meant in response to message after Joe's, rather than to Joe's

That's a strange one I'll be watching the emails on. Differential diagnostic techniques might apply. Could you post screenshots of DPC latency checker runs at Idle and full crunch with shorties ? Just eliminates a swathe of possible system issues rather unknowable by the application."Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.

Eric did check some of those results. The surprise is that there was no truncation or corruption, the result files were clean but simply didn't have the required best_autocorr (which should have been produced the very first time the autocorr search was performed, and typically updated by later searches). Given the millions of tasks which have been done by x41zc without previously showing that issue AFAIK, it's a rare puzzle.

Since jason_gee is the primary authority on those CUDA apps, I'll probably not contribute anything further.

Sorry for the delay I was away from home.
I started DPC Latency checker while all the calculations suspended,
After 30 seconds, I started GPU calculations and kept it running for over one minutes. GPU usage was between 95 and 99% according to nVidia software.
Here's what's said:
Test Interval: 1000
Current Latency: 139
Absolute Maximum:166
I hope I used the software right (I'm not sure what I measured...)
Thanks.

Yep, looks fine, as long as there were no big spikes (which would have shown in the maximum, if they had happened)

[Edit] in addition to the below items, I just noticed the systems are hyperthreaded ? it's a long shot, but simply freeing a (virtual) CPU core (if not already, sorry if I missed it) and/or raising the process priority of the application might influence those symptoms. Sing out if you'd need directions on that. If it helps, It would indicate pretty high system pressure IMO.[/ Edit]

That [DPC latency check] just eliminates a whole bunch of possibilities to do with system drivers of all sorts, PCIe transactions don't see any particularly bad delays, and that there are no weird power saving things going on that could make the CPU miss events ( e.g. Windows 7 default CPU processor management throttling down the CPU speed to 5%, hidden in advanced ::S ).

All that means is that whatever's causing that data to be missing, isn't because of a lot of basics, and more exploration is needed.

Is the storage drive Boinc crunches on an SSD, and if so what brand/model ? Also I see they're Intel CPUs, are you running Intel Chipset RAID or AHCI mode ?

These are just more things that combined with some boinc/api nuances could show something out of the ordinary. some other probing basics on various lines could range from reseating cards in the PCIe slot, and power connectors, checking temperatures/frequencies and voltages. Even though there is no direct evidence I see of an issue there, something minor could be affecting another part of the system, so doesn't hurt for the easy checks.

failing the easy stuff out of the way, does use of the special commode build available at: http://www.jgopt.org/download.html change the symptoms ? Many could help putting that in if you need more details."Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.

Jason, thank you for taking time.
SSD I'm using is Corsair CSSD-F120GB2 and this is the only storage device on the system.
By virtual CPU, do you mean 4 core processor running 8 threads?

Going back to my original question, the reason that some of my returned WUs are labeled Invalid is due to a part of data is missing from those? If so, I didn't know there was a internal mechanism like that existed.
Thanks again.

Yeah, Intel Hyperthreading, 4 physical cores running 8 threads (etc). Someone will have to chime in for correct options by Boinc client version (as I use an old client), but setting % of processors to use out of the 8 threads should enable you to free one or two.

For the Cuda application process priority, there is a file called ***mbcude.cfg (it is named slightly differently depending on the stock application you process with), which can be edited with notepad or similar plantext editor. There is an example line to uncomment for processpriority, and a suitable setting could be 'normal'

Yes, I was in on Joe's email correspondance with Project staff, which indicated the results were missing a chunk of information, though oddly the result file had its proper closing tags. That amounts to a rare and interesting mystery, as opposed to the known application faults and Boinc mechanism quirks, because it is something that should have happened before if it was application related.

Should freeing a (virtual) core or two to feed the GPU, raising the process priority a little to make sure there are not weird delays going on, and/or switching to special builds all fail to change the symptoms in any way, then we have a genuine headscratcher :)"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.