Nothing like coming back from a long holiday weekend and having one of your main production servers croak as soon as you arrive. It's a sunny day outside and I was stuck wearing my fleece jacket and fingerless gloves inside a well air-conditioned server closet.

So what happened? Not sure exactly, but bruno (the upload server, as well as the main boincadm administrative server) was all hung up as soon as we started the normal Tuesday outage. I had to reboot it, and that was that - it wouldn't come up properly again.

It seems to be a multiple-part problem. There was a disk failure, and the 3ware card in this system has always given us trouble. What kind of trouble? Well, if you reboot the system (without a full power cycle) random drives go missing. That's kind of a problem, no? I don't think this is a single broken card - a labmate has similar problems with the same model in his system (I forget the model #, but it's 24-channels). Anyway, the big RAID10 holding all the results was tagged as degraded and rebuilding now.

That's fine, except the OS (which is on separate partitions and not under the jurisdiction of the 3ware card) isn't booting either. Jeez! The good news is I can boot of a Fedora live CD and see both the root and upload storage drives, so there's no data loss. It just won't boot!

The other good news is that, if we need it, we have a backup system already: synergy! It might be getting pulled into prime time sooner than expected. It doesn't have nearly the large number of disk spindles as on bruno, but this might not be an issue - there's still plenty of disk space on it. And a lot of memory for potential file system caching. It's still undecided if we're going to make synergy the new bruno, but I'm at least copying everything there now just to be safe.

I might still be able to get bruno up this afternoon, but if not, looks like we're down for the evening (it'll take that long to copy everything over to synergy).

- Matt-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude

Well. I guess that means Hello Synergy!! If it's half as good as it appears it should be able to handle the new assignment with no problems. After checking my tasks on my BOINC Manager I see I can easily handle a couple of days downtime.

Even if I don't make it until you get back up I have some new toys coming in that I can install during the down time. I'll be here ready to go whenever you get back.

By the way, no dice on bruno. I tried a bunch of things, it still won't boot - even though I can see all the drives/data when I boot from CD. And the RAID card refuses to NOT tag the RAID as degraded.
...
- Matt

Matt, Chin Up You'll figure out what Gremlin is bugging Ya and squash It flat in no time once Ya do figure It all out. Me I just have a learning curve with Vista Business x64 and with RealTemp 3.58, It isn't pretty, As I had to start It up with Task Scheduler instead of the Startup folder, It should start when the PC does now, I may upgrade to 7 Pro sooner than I thought, At least I have the DVD already. Good Luck.Pluto is still a planetBeep! Beep!

Thanks for the heads up Matt, explains why all my uploads have been backed off 3 hours. Kind of figured it was just an extended outage on the upload server. Oh well thank goodness you have another server to throw in there.Traveling through space at ~67,000mph!