OK, I think the statute of limitations has run out on this one - let's let the cat out of the bag. Eric told me that David had seen the problems starting to build up, late in the evening of Saturday 3 November. In response, he deliberately turned off 'resend lost results', thinking this would reduce the load on Synergy and allow it to function normally again. Turned out slightly differently....

I think that just shows that programmers and sysops are different animals: you shouldn't expect either to be able to do the other's job.

You didn't explicitly say. Did someone turn it back on? I think we all assumed so, but...DavidSitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.

OK, I think the statute of limitations has run out on this one - let's let the cat out of the bag. Eric told me that David had seen the problems starting to build up, late in the evening of Saturday 3 November. In response, he deliberately turned off 'resend lost results', thinking this would reduce the load on Synergy and allow it to function normally again. Turned out slightly differently....

I think that just shows that programmers and sysops are different animals: you shouldn't expect either to be able to do the other's job.

You didn't explicitly say. Did someone turn it back on? I think we all assumed so, but...

I did receive a resend this morning. So as of 6:55 AM US Eastern Standard Time it was on.SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the BP6/VP6 User Group today!

I guess, those are the times in which the packets of the body were really sent... Can it be that they took some time because they had to wait until the pipes have "space" for them?

"some time"? You can say that again.

Wireshark was timing to the microsecond. And on a gigabit network port, it would expect to see about 100 bytes per microsecond. Two whole minutes feels like a lifetime, at networking speeds. Nothing is that busy.

Well, I was just asking, but waiting a minute between 2 packets for a specific conection that are not consecutive in their numbers just makes me feel that in that time it was sending other packets to other conections... or also, that the system was bussy doing something with high priority than the network I/O delaying it?
And again Im just asking, I have just basic knowledge of how those things work and may be Im missing something about why you think thats so weird or unexpected.

All this is beginning to sound more like a failing router than anything substantial. (Last time people had to use proxies to get work.) We may just have to wait this one out. Crunch for another project until it gets sorted out.

Overnight i left my systems running without the proxy.
There were still a few Scheduler timeouts, but not many. Scheduler responses were mostly occuring within 1 minute. Some within 30 seconds, a few others back up around the 2 minute mark.

EDIT- naturally as soon as i posted this i had a couple of Scheduler timeouts, but since then it's been getting responses within a minute or so.

Once again i noticed the Master Database queries were still around 800/s. Also the amount of work in progress has dropped below the amount of work awaiting validation.Grant
Darwin NT

OK, I think the statute of limitations has run out on this one - let's let the cat out of the bag. Eric told me that David had seen the problems starting to build up, late in the evening of Saturday 3 November. In response, he deliberately turned off 'resend lost results', thinking this would reduce the load on Synergy and allow it to function normally again. Turned out slightly differently....

I think that just shows that programmers and sysops are different animals: you shouldn't expect either to be able to do the other's job.

You didn't explicitly say. Did someone turn it back on? I think we all assumed so, but...

Yes. When I quoted Eric's note on the day it all blew up (message 1302257), I redacted the bit about David turning resends off.

Wireshark was timing to the microsecond. And on a gigabit network port, it would expect to see about 100 bytes per microsecond. Two whole minutes feels like a lifetime, at networking speeds. Nothing is that busy.

BTW- would any of these issues possibly explain why the Scheduler is randomly declaring 200 WUs at a time abandoned?

I've had it happen once, Claggy just had it occur & Khangollo has had it occur at least twice & knows of others it's occured to.Grant
Darwin NT

Wireshark was timing to the microsecond. And on a gigabit network port, it would expect to see about 100 bytes per microsecond. Two whole minutes feels like a lifetime, at networking speeds. Nothing is that busy.

BTW- would any of these issues possibly explain why the Scheduler is randomly declaring 200 WUs at a time abandoned?

I've had it happen once, Claggy just had it occur & Khangollo has had it occur at least twice & knows of others it's occured to.

Possibly. Missing complete scheduler contacts, so that

Number of times client has contacted server 35345

(shown on the website)

is no longer compatible with

<rpc_seqno>35346</rpc_seqno>

(from local client_state.xml)

can trigger BOINC's anti-cheating mechanisms - it looks like somebody is trying to use the same HostID on more than one computer at once, to inflate the host's RAC.

The usual defensive response is to generate a new HostID. Did either (any) of you have a new host, with the same hardware as the one which 'abandoned' tasks, but a high, recent, ID number and no credit, appear on their accounts recently?

Did either (any) of you have a new host, with the same hardware as the one which 'abandoned' tasks, but a high, recent, ID number and no credit, appear on their accounts recently?

Just had a look at my account page, the only hosts there (active in the last 30 days) are my present ones.
Showing all hosts just brings up my old (and long deceased) AMD systems.

EDIT- the odd thing is that my Abandoned tasks occured when i was using the proxy; when i was using the proxy i was getting responses within 30 seconds, sometimes within 15 secs in some instances.Grant
Darwin NT

can trigger BOINC's anti-cheating mechanisms - it looks like somebody is trying to use the same HostID on more than one computer at once, to inflate the host's RAC.

The usual defensive response is to generate a new HostID. Did either (any) of you have a new host, with the same hardware as the one which 'abandoned' tasks, but a high, recent, ID number and no credit, appear on their accounts recently?

I got a similar reaction when I added a new host to the project yesterday, instant 64 abandoned workunits. No duplicate host.

Things are seriously wierdly screwed.
In the last 12 hours only about 4 requests for work have resulted in work. Everything else is a mostly timeout or (for something different) couldn't connect to server error.
One machine with NNT set has just had the Scheduler respond twice in a row (4 min apart) within 7 seconds, 3 minutes later it took 3 min to get a response.
The other system during the same period timed out out while trying to report & request more work. Setting it to NNT made no difference, still timed out on the next update. Tried again straight away, response within 5 seconds.Grant
Darwin NT