UTC
Yay! Back from normal Tuesday time-out. (btw, people in lab are really morning people...)

Let's see what comes next... Cricket on top now, AP splitting disabled... Let's see and hope for better...

Yes, they took it down just before 6am California time. Doesn't look any better to me. Had timeouts on work requests, so I went NNT and some of those timed out, but I finally reported my completions. First work request generated some new ghosts and I haven't got them yet. Down to 1.5 hrs of gpu work.
____________
Another Fred
Support SETI@home when you search the Web with GoodSearch or shop online with GoodShop.

Yes, they took it down just before 6am California time. Doesn't look any better to me. Had timeouts on work requests, so I went NNT and some of those timed out, but I finally reported my completions. First work request generated some new ghosts and I haven't got them yet. Down to 1.5 hrs of gpu work.

My computers logs says that they too it down before 5:25, at that time came first "project in maintenance" -message. I don't like to wake up that early...

Disappointed that Scheduler assigned some more work before I got the ghosts, so I have more ghosts now. Not a lot and the limits will contain it. But that's what got us the limits - scheduler should handle ghosts first, but it just showed me that it is not fixed. I got the first 20 resends and they weren't all shorties, so that's a help.
____________
Another Fred
Support SETI@home when you search the Web with GoodSearch or shop online with GoodShop.

Yay! Back from normal Tuesday time-out. (btw, people in lab are really morning people...)

Yes, they took it down just before 6am California time.

That's unusual. Normally, they get in at 8am and start the maintenance some time between 8:30 and 9:00. Looks to me like it didn't run as late as it usually does, but the total time was more than normal.

____________DavidSitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.

I think we're going to need to at least temporarily go back
to restricting workunits in progress on a per host basis and per RPC
basis, regardless of what complaints we get about people being unable
to keep their hosts busy.

The splitters are already showing red/orange on the server status page, and 'ready to send' is as near zero as makes no difference (there'll always be a few errors and timeouts to resend). So I'm going to turn off NNT and see what happens - let's see if we can help get this beast back under control.

Just a repeat of the message of Eric (S@h admin) .. ;-)

(...)
I think we're going to need to at least temporarily go back
to restricting workunits in progress on a per host basis and per RPC
basis, regardless of what complaints we get about people being unable
to keep their hosts busy.

I was just talking about one of the rigs that recently got CPU units. You still had around 1500 GPU units for the 3 GPUs. @500 seconds per GPU unit that's nearly 3 days worth left. Even if you get down to 100 per GPU that's still a half a day's worth. What did you normally run your queue as? 10 days.

It doesn't make a difference in bandwidth usage in the long run once the whole seti@home ecosystem hits steady state, it'll just mean that when a super cruncher's nVidia card goes off the rails they can only shaft at most 100 wingman per GPU as oppose to thousands. (Please check your, not directed at you msattler just nVidia users in general, results daily to catch when you system starts to produce mostly inconclusive/error/invalid GPU results.)

Each 690 crunch a WU in less than 7 min runing 3 WU at time on each GPU (it have 2) about 48 per hour or more, so in a big cruncher (3x690) a 100 WU cache is simpy ridiculous, not last for 1 hour. I have 2x690 sleeping on a bed waiting they rissing the limits, with the actual limits is a waste of time/resources put them to work, simply they will not receive the WU they need to work.

That's true. But every 5 minutes it asks for more to top it back off. It's not 100 per day or per hour but 100 per GPU isn't it? Is it seti@home's fault that someone clever discovered that you could do multiple GPU units at the same time per GPU and shared it with others? Is it s@h's fault that the campus IT department only has a 100Mb line going out to their shack thus topping new units transmitted at something around 80-100,000 per hour in theory?

[rant]

For a project that started out as "Mister could you spare a few cycles for a good cause" turned into yet another professional-amateur "sport" where some people have gone nuts building dedicated crunching servers for thousands of dollars but then let them churn out endless bad results because it's not entirely stable and they only check in on them if their precious RAC starts to drop. Then some super crunchers turn around and blame s@h for bad unit generation or people like me, the tiny guys who let their $500 home computer run 24/7 for "stealing" the units that are "rightfully" theirs to process because they spent all this money simply to brag how they have one of the top 10 daily RACs. They blame s@h for running out of units or how their server infrastructure isn't as robust as, say, Amazon.

Well I am sorry that you now have to fret that your vast array of super crunchers now has a chance to run dry. That you can't sit on 20+ days worth of units because God forbid if your precious array of machines run dry for even a moment. Welcome all you crunching gods to the land of mere mortals.

[/rant]
____________
"Life is just nature's way of keeping meat fresh." - The Doctor

if you think there are problems at the moment with the scheduler, wait till the GTX7xx cards become wide spread ... a GTX780 is about 40-50% faster than a GTX680 or roughly about equal to 0.75 times a GTX690 card ... personally I think these cards are going to hum ...

I got up early and found I ran out of gpu tasks overnight. I was at limits last night. Had a couple of hung downloads - finally got them and they were short shorties that ran two minutes each. Reported my stack of results on NNT and then generated a new batch of ghosts. Trying to get them but it's mostly timeouts. Low gpu limit hurts the project, not me. I'll probably give up on intervention and run another project like the others.
____________
Another Fred
Support SETI@home when you search the Web with GoodSearch or shop online with GoodShop.

In looking back at what SETI was doing with my computers just before the current problems began, I discovered that, while everything was deteriorating, SETI was sending all four of my machines enough units for a -month- of processing each! Even after squeezing every last cycle I can into processing, I still have around ten days worth of GPU units and quite a few days of CPU units left to process on most of my hosts.

While I did have my cache set to 10 days to give me a cushion for emergencies, I have -never- been sent too many units before.

If many of us were getting the same overload by SETI, it would most likely explain many of the symptoms we were seeing. The throughput demand to crank thousands of us up to caches of that size would most certainly run the SETI servers into the ground.

Perhaps this will help shed light on the problem, and also on why the SETI staff has (temporarily, I hope), limited us to 100/CPU and 100/GPU.

I am getting "scheduler request: timeout was reached" again. All of the last 5 requests since 08:32 UTC.

I'm still getting the odd one here & there, but mostly i'm getting a response within a minute or so.

And over night i ran out of GPU work on both of my systems because i got nothing but Timeout errors on every Scheduler request.

I set NNTs, one system managed to report, the other is still getting timeouts.
After clearing the backlog on one system i set it to get new work. Nothing but Scheduler timeouts, the other system still hasn't been able to report it's work. I expect to run out of CPU work in the next couple of hours on one system, the other later today- only because it is so slow.

I think they need to keep AP offline till they work out what the problem with the Scheduler is- limiting the number of tasks hasn't fixed the problem. It's barely even had an effect on it.
They really do need to address the problem- the work around (limiting tasks) has done nothing except result in people running out of work.
____________
Grant
Darwin NT.