I'll write today's message early as this week is a short holiday week so we're kinda busy.

First and foremost, carolyn is now the *only* mysql replica - I just turned the other replica (the troublesome server mork) off, perhaps for good. Yay! That's one of the two new servers more or less ready for prime time, though we still hope to make carolyn the master (and jocelyn the replica) today or tomorrow.

We're still far from getting the whole project back on line - we have the other new server, oscar, installed and ready to roll, but still need to (a) install and configure informix on it, (b) clean up the science database on thumper, and then (c) transfer all the data from thumper to oscar. This may take a while - the spike merge (which was the last major part of the "clean up") did finally complete last week (after running about 2-3 months) but there was still a discrepancy of about a million missing spikes which Jeff is successfully tracking down. So there are a few extra merges to do yet. We probably won't really dig into getting oscar on line until after Thanksgiving.

Of course, what's a weekend without an unexpected server crash or two? On Saturday afternoon a major lightning storm swept through the Bay Area. Other projects in the lab (located in the other building) had major power outages. Luckily we were spared a full outage, but apparently a couple of our servers got hung up around this time, perhaps due to some kind of non-zero power fluctuation. The servers were thumper and marvin - each located in different rooms, and on different breakers. It is funny that these two machines are our current two informix servers (thumper holds the SETI@home scientific data, and marvin holds Astropulse). So there was some cleanup to deal with this morning (database/filesystem recovery, hung mounts, etc.) but really no big shakes and we're back to normal (whatever normal is these days). Both systems were on surge protectors so I'm not sure why they were so sensitive - maybe the crashes were random and the timing was coincidental with the storm.

- Matt

____________
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude

Thanks for the update Matt. Take your time, we will be here when your ready to turn it back on. Maybe soon these good news/bad news messages will turn into only good news for many long times to come.
____________

I agree that you guys are doing a superb job, Matt. Having fun with the new toys. :-) And thank you for the update. We are all patiently waiting for for the new systems to go live. I'll just continue chewing on Einstein and LHC w/u's here until then.

It might be a good idea to acquire some backup power units rather than simple surge protectors. Modern ones will allow the the servers to gracefully shut down from battery power when the mains go out, and let the batteries take the hits from surges.

It might be a good idea to acquire some backup power units rather than simple surge protectors. Modern ones will allow the the servers to gracefully shut down from battery power when the mains go out, and let the batteries take the hits from surges.

Matt has described over the years that all of the servers are each on heavy-duty UPS backup systems.

It might be a good idea to acquire some backup power units rather than simple surge protectors. Modern ones will allow the the servers to gracefully shut down from battery power when the mains go out, and let the batteries take the hits from surges.

Matt has described over the years that all of the servers are each on heavy-duty UPS backup systems.

But any surge protectors are sacrificial as they age.

True server grade online UPS systems can be thousands of dollars.....
Not the $100.00 APS rigs that some might buy hoping to shore up their living room PC.
I have a couple of 1500w units that, due to their age, are probably only still good at surge suppression and voltage regulation, because their battery packs are long past their prime.
The lead-acid gel cells used in most backups have a standby life of about 5 years. If you don't replace them at that point, their capacity is much diminished. And they are not real cheap to replace.

The best protection is a true online UPS.....
They convert the AC mains to DC, keep the batteries charged, and continuously convert the DC back to AC to feed to the computers. The rigs never touch the mains. They are a bit less efficient to operate, due to conversion losses, but they are the best at protecting the connected equipment.
And rather expensive.
____________
*********************************************
Behold the power of kitty!!

...Of course, what's a weekend without an unexpected server crash or two? On Saturday afternoon a major lightning storm swept through the Bay Area. Other projects in the lab (located in the other building) had major power outages. Luckily we were spared a full outage, but apparently a couple of our servers got hung up around this time, perhaps due to some kind of non-zero power fluctuation. The servers were thumper and marvin - each located in different rooms, and on different breakers. It is funny that these two machines are our current two informix servers (thumper holds the SETI@home scientific data, and marvin holds Astropulse)...

... Both systems were on surge protectors so I'm not sure why they were so sensitive - maybe the crashes were random and the timing was coincidental with the storm.

- Matt

First Matt... thank you for taking time to issue these updates. You can't imagine how important they are to the community. Personally, I hardly ever respond, but believe me that's no indication of their value.

What struck me about your post, was the closing supposition... One crash with a storm might be random, not two.

Others here have observed that suppression is sometimes sacrificial. I have found this to be true.

I don't know if you regularly do any EMC testing of suppression integrity there, but I encourage your group to do so. From your description, I'd begin with the facility grounding system.

I bought an UPS last summer to protect my SUN workstation from summer blackouts due to airconditioners for 79 euros and it worked well. I remember one summery day at Area Research Park in Trieste when the UPSs shut down because of poor air conditioning in their closet and all Area computers were stopped, including that of Nobelist Carlo Rubbia, who was building the Elettra synchrotron radiation machine. He was rather upset.
Tullio
____________

I have seen in the past that the throw time for a UPS combined with a power supply's hold-up time can be very close to being truly uninterrupted. Sometimes if the right conditions happen, you still end up with a brown-out on the DC side of the power supply. Most times the system will just shut off, but sometimes it will just freeze due to CPU/RAM/chipset forgetting what it was doing due to reduced power, albeit briefly.

UPS battery packs do in fact become effectively useless after a few years, though I have heard on numerous occasions that discharging the batteries to at least 50% once per month can in some cases double the life of them.

Once your batteries do become useless, depending on how much a new equivalent unit is, it is very cost-effective to replace the batteries, often times several times before it becomes time to just buy a new unit. I replaced the batteries in my 1500 about three years ago for US$120, when a new 1500 like it was well over 500. Then I brought home two 1400 carcasses from work and got batteries for them for less than 200 total. Batteries are inexpensive in comparison a lot of times.
____________Linux laptop uptime: 1484d 22h 42m
Ended due to UPS failure, found 14 hours after the fact

Thanks KittyMan. Sometimes I find it frustrating wanting to help in their rebuild in a field that I have expertise in. I'm trying hard not to be an arm-chair quarterback since I do not know all the in's and out's of their current situation. However, when I saw the photos of their server rack I was more than a little shocked. It was hard to believe that they were supporting so many clients in the real world on that setup. I understand that there are financial limitations that make it hard for the seti guys to have the latest and greatest hardware, but a lot can be done with just some common sense and a shoe-string budget.

The power issues are a great concern to me. If I were Seti, I would consider co-locating their servers in a Tier 4 data center. A cage big enough to house their equipment would cost very little and all access can be done remotely (unless hardware changes are required.) In our setup, myself and my team manage over 10K windows servers remotely in our two Tier 4 data centers. We have two people on site that handle any hardware changes that are required and at least 1 person on site per 8 hour shift in the command center in the event of an emergency. (My team is myself and 3 other Sr. Engineers, 15 system engineers in India, and 4 interns.)

I bet with a little work Seti could get the cage donated and their costs would be practically 0. I would think their highest MRC would be bandwidth charges. (Hell, if I was given the ability to speak as a duly authorized agent on their behalf, I could probably find them the co-location facility and get a cage donated.)

Again, I apologize, and I am not trying to attack anyone's work ethic, but there are times I want to help the project so badly and being able to lend my expertise is quite frustrating.

One thing I will recommend, go to a company like upsforless.com and purchase a few Online Double Conversion UPS's. (Make sure to get the Double Conversion UPS's. They are the best and most secure type of UPS available.) I have purchases two of their liebert ups's and they are great. (One for my home theater, one for my computers in my office.) They are refurbished units but come with a full warranty and are a hell of a bargain. (I have nothing to do with the company, just pointing out a good value)

I bet with a little work Seti could get the cage donated and their costs would be practically 0. I would think their highest MRC would be bandwidth charges. (Hell, if I was given the ability to speak as a duly authorized agent on their behalf, I could probably find them the co-location facility and get a cage donated.)

Okay, let's assume that for $0, SETI could get space in a nice data center.

They'll still need to pay for bandwidth between the servers (the data center) and the users.

Then we have the "tapes" from Arecibo, which are shipped from Puerto Rico, and have to be mounted and copied to the servers to be split.

That's bandwidth from Campus to the Data Center, probably equal to what they currently have (and have to pay for) -- and you need that bandwidth to bring the completed work back.

Doubling the monthly bandwidth expense may not turn out to be "help" -- and that's why a data center may not be as good an idea as it might seem.

...though I have heard on numerous occasions that discharging the batteries to at least 50% once per month can in some cases double the life of them.

Nope.
Heat tends to be the biggest killer of Lead Acid batteries.
Here in Darwin, if you get 2 years out of a car battery, that's pretty good going. When i lived down south (much further down south) 10 years wasn't unusual.

When a lead acid battery voltage drops to 10V, it's as good as dead. Deep cycle batteries can handle such a deep state of discharge, but not often or regularly.
____________
Grant
Darwin NT.