The new year is unfolding nicely, more or less. Wow - 2013. Every new year now sounds like a science fiction year. I don't really have anything major to report, but here's another update anyway.

We were supposed to have some more lab-wide power repairs last weekend. This got postponed to a later date which has yet to be settled upon.

As I've been mentioning for years, the boinc server backend (everything pertaining to creating the workunit, sending it out, receiving the result and processing it) performs in many parts on a set of constantly changing servers of disparate make and model and power, and thus some problems involves so many moving targets that it's almost impossible to diagnose. I tend to refer to these times when performance is lower than expected as "server malaise." It also doesn't help we are dealing with an almost constant malaise given we are pretty much maxed out on our network connection to the world 24 hours a day. This is like running a retail business with a line out the door 24 hours a day - no quiet time to clean the place up, restock the shelves, etc.

Usually when we see some queue backing up, or network traffic drop, the procedure is somewhat like this: 1. check to see if a server or important service (httpd, informix, mysql) isn't running - these are easy to find and hopefully easy to fix. 2. check to see if some BOINC mechanism (validation, assimilation, etc.) is stuck on something - these are relatively easy to find (by scanning logs and process tables) and sometimes easy to fix, but not always. 3. check to see if everything is kind of working, just slowly. If this is true, we tend to write it off as "server malaise" and wait and see if it improves on its own - the functional equivalent of "take two aspirin and call me in the morning." Usually we find things improve on their own over time, of if not then more obvious clues as to actual problems make themselves clearer. We simply don't find it an efficient use of our very limited time to understand and solve every problem perfectly.

I mention all this as we certainly had a few malaises over the past few weeks. The one last week was due to the one cronjob failing to run, which didn't update some statistics, which led to some splitters running too much and generating too much work, which led to a bloated database and bloated filesystem, which led to slow backend processing, which took about 4 days to clear out, but it eventually did without any effort on our part. During that time general upload/download bandwidth was constrained a tad, but we survived.

Otherwise, things are well. The recent (or relatively recent) server upgrades have been a major blessing, and more are planned. During the outage on Tuesday I actually moved some servers around such that *all* the SETI related servers are now in the closet (as opposed to our auxiliary lab). This is a first, I think. Outside of our desktops all SETI machines are in the racks.

Of course, this is just in time for the closet a/c to be in need of repair. This surgery happening on Monday, and may take a couple days, during which the projects will all be down (with limited servers left up to keep the web site alive with a warning on the front page and status updates). We hope to be back up Tuesday afternoon. There is a chance repairs won't work. We have a plan B (and C) if this happens but let's just be positive and cross that bridge if/when we get there.

Oh yeah one random note. Yesterday I had some fun with this database weirdness. Somewhere along the line, perhaps during one of many sudden power outages, a small set (i.e. about 10 out of 3,000,000,000) of the spikes in the database were cloned, and became two entries in the database, with the same id #s. This is "impossible" as id #s are primary keys and supposed to be unique. So which of the clones we were seeing was depending on how you were selecting these spikes - selecting by id or by some other field you'd get one clone or the other. This wasn't apparent at all until I tried to update values in these spikes, and then when selecting them I'd get the unupdated clone version and it looked like the update wasn't working. Long story short I finally figured this out and got rid of the clones. But yeah databases sure can be funny sometimes.

- Matt

____________
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude

Thanks so much Matt. I appreciate the overview of your quick check and diagnosis methods. It makes a great deal of sense given your shortage of staffing. When things go wrong on my six computers here, it is often an internet connectivity issue like Comcast being down or my modem or router getting turned off accidentally. I check that first. If it is common to all my machines, I often just let it go for a while and let the issue rest. As I run out of work, I gradually switch machines over to backup projects. My favorites are GPU Grid for my Nvidia cards and Rosetta at Home for my CPU's They respond very fast and soon all machines are running flat out on them. The main thing for me has been to learn to relax and take it easy with all this. I do what I can without running around too much from machine to machine. Sometimes, I get a bad update from Nvidia and have to roll it back to the last widely accepted version. Other times, I've had a section of memory go bad or a hard drive fry. Am slowly learning to take things in stride. Like with you, but to a much smaller degree, I often have something not quite working right and every so often, a major repair is involved. It was very kind of you to share with us. It puts me much more at ease because I know now that you attack the problems and find and tweak many issues. I love your idea of computer malaise or network malaise. A vague type of discomfort from an unpecified source describes it well. Generally my malaise turns to specific types of issues as I am with Seti @Home longer and longer. That's headed in the right direction. Brother Frank

So I think everyone knows you've been getting new hardware (which is great!!!).

Is there a plan somewhere to upgrade the internet connection sometime? I know this bottleneck has been discussed many, many times in the past, but I've not heard of any plans for actually resolving the issue.

So I think everyone knows you've been getting new hardware (which is great!!!).

Is there a plan somewhere to upgrade the internet connection sometime? I know this bottleneck has been discussed many, many times in the past, but I've not heard of any plans for actually resolving the issue.

Thanks!

AFIK the problem with the bandwidth is intractable absent a large infusion of cash. The problem is political in nature and the beast is underfed and is demanding a full meal to pass.

The GPUUG have set up a page which lists the current hardware (and monetary) donations.
Take a look at this threadhttp://setiathome.berkeley.edu/forum_thread.php?id=70511
____________
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?

A small problem in your code, Matt.
WU 1029297452 (it is AP WU) has three valid results since mid Aug 2012 already and it is still not validated. The problem is that 3rd task was possibly released shortly before the 2nd was returned from the field, probably just before its deadline.
____________

A small problem in your code, Matt.
WU 1029297452 (it is AP WU) has three valid results since mid Aug 2012 already and it is still not validated. The problem is that 3rd task was possibly released shortly before the 2nd was returned from the field, probably just before its deadline.

Just before the deadline of which task? If it is the second, this is highly unlikely as not being returned by the deadline is the trigger that generates the third task.
____________BOINC WIKI

A small problem in your code, Matt.
WU 1029297452 (it is AP WU) has three valid results since mid Aug 2012 already and it is still not validated. The problem is that 3rd task was possibly released shortly before the 2nd was returned from the field, probably just before its deadline.

Just before the deadline of which task? If it is the second, this is highly unlikely as not being returned by the deadline is the trigger that generates the third task.

Indeed, task 2525261366 had a report deadline of 8 Aug 2012, 22:42:42 UTC. The third task was Created 8 Aug 2012, 22:42:45 UTC and Sent 8 Aug 2012, 22:42:47 UTC.

The third result may or may not be valid, but it is certainly a problem that when it was reported as a success Validation didn't take place. There are other Astropulse WUs in the same state for the same reason, that sequence of events seems to always lead to a zombie-like state for Astropulse tasks (but not for SETI@home Enhanced).

A small problem in your code, Matt.
WU 1029297452 (it is AP WU) has three valid results since mid Aug 2012 already and it is still not validated. The problem is that 3rd task was possibly released shortly before the 2nd was returned from the field, probably just before its deadline.

Just before the deadline of which task? If it is the second, this is highly unlikely as not being returned by the deadline is the trigger that generates the third task.

Indeed, task 2525261366 had a report deadline of 8 Aug 2012, 22:42:42 UTC. The third task was Created 8 Aug 2012, 22:42:45 UTC and Sent 8 Aug 2012, 22:42:47 UTC.

The third result may or may not be valid, but it is certainly a problem that when it was reported as a success Validation didn't take place. There are other Astropulse WUs in the same state for the same reason, that sequence of events seems to always lead to a zombie-like state for Astropulse tasks (but not for SETI@home Enhanced).

Joe

But SETI@home Enhanced does have its own problem with old stuck tasks such as this one,Workunit 638353788, and I know of several others with tasks caught in the same limbo dating back to around the same time.

But regardless of whether they be AP (of which I have 2 stuck) or MB these old tasks would have to be producing some effect on the performance of the database.

Ladies and Gentlemen
The place to discuss the current downloads is in "number crunching", where I am sure you will you are not alone.
____________
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?