> As the only sysadmin for a 260-node cluster, I'm extremely curious what
> jobs those 2 people were supposed to be doing. I have an operations
> staff to rely on for some environmental stuff and for handling service
> calls with vendors (I report the problem to them and do the hw
> replacement, they just take care of the phone call). However, even with
> 260 nodes I still find a lot of my time spent in trying to improve the
> cluster as opposed to just keeping it running.
Well, this is probably an apples to oranges comparison.. I've worked in
environments where I was the only systems administrator, and ran 500
servers on my own.. It's rather trivial to administer a real cluster,
where there's only one or two functions for the entire thing.. It's
exponentially more work to keep good process in terms of consistency,
configuration management, version control, patch management, and the
general overall health in a non-cluster environment where you might have
100 servers, in groups of 2 or 4 servers per function, and maybe even
several one-off servers.
This is my first forray into building a single-function cluster in several
years, and I'm trying to determine if tried & true enterprise management
techniques can be a value or a detriment in a beowulf environment, or at
least figure out which concepts carry over, which are superfluous, and
which just aren't applicable.
-------------------
BitPusher, LLC
http://www.bitpusher.com/
1.888.9PUSHER
(415) 724.7998 - Mobile