I've spent an incredible amount of time dealing with a few hundred machines - and by dealing, I mean clustering, administering (simultaneously) QA'ing software on them, so on and so forth.

I've really ramped up my own experiences in this space over the last few years too, especially since I picked up the Python hat and started doing everything in well, Python.

One of the harder things to do when you're interacting with greater than 5 simultaneous machines include flow control, handling exceptions properly, managing return data - etc.

In a perfect world, you'd just make your own client and server - but why do that if you have something as ubiquitous as SSH?

So, I do all my work through SSH - this is a blessing, and a curse in a lot of respects. The first good thing is that, well, SSH runs everywhere. As long as I do the initial work of getting SSH Keyring information setup properly everyplace, I don't need to deal with user authentication and can strictly scope myself to command execution.

And here is where the curse comes in.

When you're writing applications that subprocess() or popen() on the local system, it's trivial to track the pid, exit status and other information of that spawned process from the local machine, generally speaking you don't have to worry about hang conditions or not getting a "rational" exit status.

SSH doesn't prevent that - not at all. However, it does make life a little more interesting. You have to worry about hang conditions a lot more, you have to worry how you pipe data back to the client - you have to be sure to close things like stdin() (otherwise the SSH process never returns - as long as a stdin() pipe exists, SSH just chills out), etc. Then when you throw the fact you have to manage this across a few hundred machines the situation gets even more messy.

It's not an impossible situation - use select() to manage the pipe from the spawned SSH process - make sure you're capturing the exist status of the command you're running through ssh - and the exit status of the ssh command itself (a process might exit 2 on the remote host, but ssh would always return an exit 1 - both are failures, but you have to harvest both) - make sure you handle simultaneous execution through threads - make sure those threads have timers on them, and that they can't bring down the parent process, etc, etc.

Overall, Python and SSH is a god-send for people like me who have to manage, script and test across large farms of machines. Luckily, I don't have to worry about the topology of each machine being different - mine are all clones.

Using Python and SSH together - intelligently can save you a ton of time, and ultimately make you more productive internally, as your code should be fairly portable (and cross platform). Like I said - SSH is everywhere.

Hopefully in the next few months I'll carve off some projects I can share with the world (time is always precious) that revolve around this. Until then, take a peek at these tools and utilities:

PuSSH - Pythonic, Ubiquitous SSH :
PuSSH is a relative newcomer to the space - and has a lot of good ideas inside the code, and well, it works. I've peeked at the code and gotten my hands dirty with it, hopefully I can get more runtime with it over the next few weeks to learn more about it.

PSSH:
PSSH is, in my mind, the de-facto tool in this space. It's flexible, and has a lot of thought and work put into it, making it incredibly flexible and useful. It does all the correct flow control, and is highly readable and flexible code.

Brent Chun (the creator) also has some other tools he's made (see the page I linked) in this space, so he knows what he's doing. He's also put out a lot of papers on the subject of large scale installations.

The Distribulator:
A type of distributed shell - another tool in the toolbox, but when I first used it, it was on its first release, and fairly bug-ridden. I've used it a few times since then, but the XML format of the configuration and a few other nits made me feel like it's a little heavier, and what it's trying to do could be in a lighter, more efficient manner.