Posted
by
michael
on Monday March 15, 2004 @05:03AM
from the does-it-correct-typos? dept.

Paul Howe writes "Roaming around my school's computer science webserver I ran across what struck me as a great (and very prescient) idea: a fault tolerant scripting language. This makes a lot of sense when programming in environments that are almost fundamentally unstable i.e. distributed systems etc. I'm not sure how active this project is, but its clear that this is an idea whose time has come. Fault Tolerant Shell."

It's a well meaning idea, but it would cause more problems than it would solve. It would just encourage sloppy code; people would rationalize "I don't need to fix errors because it doesn't matter", which is a very bad habit to get into when programming, ignoring errors, or even warnings

This will not improve people's skills. In fact, it willl make them more prone to mistakes, and more likely to get the result that they didn't expect. It's similat to computer spell checkers. Ever since people started relying on these, their spelling has gone way downhill simly because they don't bother thinking. Computer do all the spelling for them. They don;t need a spell checker. They need spelling lessons.

This si even worse. Computers will try to second guess what the user means, get get it wrong half tyhe time.

A qualified shell scripter will be not make these mistakes in the first place. Anyone who thinks they need this shell actually just need to learn to spell and to ytype accuratly.

While, yes, you manage distributed systems from the center, you don't *push* updates, changes, modifications because, it doesn't scale. You end up having to write stuff like this fault tolerant shell which is frankly backwards thinking.

Instead, you automate everything and *pull* updates, changes, scripts etc. That way if a system is up, it just works, if it's down, it'll get updated next time it's up.

I won't go into details but I'll point you at http://infrastructures.org/

how many times have you hacked something together in perl that ended up being relied on for some pretty important stuff, only to find 6 months down the track that there's some condition (db connects fine, but fails halfway through script execution as an example) you didn't consider and the whole thing just collapses in a heap - a nasty to recover heap cause you didn't write much logging code either.

This would REALLY be useful when you're connecting to services external to yourself - network glitches cause more problems with my code than ANYTHING else, and it's a pain in the arse to write code to deal with it gracefully. i'd really really like to see a universal "try this for 5 minutes" wrapper, which, if it still failed, you'd only have one exit condition to worry about. hey, what the hell, maybe i'll spend a few days and write one myself.

All the programmers who need the environment to compensate for their inadequacies, step on one side. All the programmers who want to learn from their mistakes and become better at their craft, get on the other side.

They are deleting a number of files on a number of different machines, then downloading an updated version. The implication is that the fault tolerance means a failure is not fatal.

So what happens if the files are crucial (let's use the toy example of kernel modules being updated): The modules get deleted, then the update fails because the remote host is down. Presumably the shell can't rollback the changes a la DBMS, as that would involve either hooks into the FS or every file util ever written.

Now I think it's a nice idea, but it could easily lead to such sloppy coding; if your shell automatically tries, backs off and cleans up, why would people bother doing it the 'correct' way and downloading the new files before removing the old ones?

Doesn't that depend on the definition of clustered though? Clustered systems can be things like beowulf clusters. But often a collection of standalone web servers behind a http load balancers is commonly referred to as a web cluster or array.

IMHO as someone who works in a complex web server / database server environment, there are many interdependancies brought by different software, different platforms and different applications. Whilst 100% uptime on all servers is a nice to have, it's a complex goal to achieve and requires not just expertise in the operating systems & web / database server software but an indepth understanding of the applications.

A system such as this fault tolerant shell is actually quite a neat idea. It allows for flexibility in system performance and availability, without requiring complex (and therefore possibly error prone or difficult to maintain) management jobs. An example would be server which replicates images using rsync. If one of the targets is busy serving web pages or running another application, ftsh would allow for that kind of unforeseen error to be catered for relatively easily.

This is not about catching scripting errors. It does not fix your code. It is about catching errors in the enviroment that scripts are running in.

Shell scripts should be short and easy to write. I have seen plenty of them fail due to some resource or another being temporarily down. At first people are neat and then send an email to notify the admin. When this then results in a ton of emails everytime some dodo knocks out the DNS they turn it off and forget about it.

Every scripting language has their own special little niche. BASH for simple things, perl for heavy text manipulation, PHP for creating HTML output. This scripting language is pretty much like BASH but takes failure as given. The example shows clearly how it works. Instead of ending up with PERL like scripts to catch all the possible errors you add two lines and you got a wonderfull small script, wich is what shell scripts should be, that is none the less capable of recovering from an error. This script will simply retry when someone knocks out the DNS again.

This new language will not catch your errors. It will catch other peoples errors. Sure a really good programmer can do this himself. A really good programmer can also create his own libraries. Most find of us in admin jobs find it easier to use somebody elses code rather then constantly reinvent the wheel.

Seems kind of useless in a totally GUI environment. Of course, maybe it's just me?

Uh.. it's just you. You should, y'know, maybe try using Windows 2000 or XP sometime... Windows has a perfectly good command line. Point at the "Start" menu, click "Run" -> type "cmd", and away you go.

You can turn on command line completion (search for "TweakUI" or "Windows Powertoys", I can't be bothered to link to them). And even pipes work just fine (as they have since the DOS days). For example:

dir *.txt/s/b > textfiles.txttype textfiles.txt | more

There's a whole build target type that specifies Win32 executables as command line programs - such programs play nicely with command shells and piped input & output. For example, all the Microsoft compiler tools are command line programs that are wrapped by the GUI. You can pull code from a Visual SourceSafe database and rebuild a project all from the command line if you want - such build mechanisms would be a prime target for a fault tolerant shell (although I think Scons has well solved the rebuild problem itself).

Other tools, such as Python and Perl all play nicely with the command line and could also be used with this shell on Windows.

"An example would be server which replicates images using rsync. If one of the targets is busy serving web pages or running another application, ftsh would allow for that kind of unforeseen error to be catered for relatively easily."

It depends how you organise your systems. If you push to them then yes you need something like ftsh. If you organise them so that they pull updates, pull scripts to execute and arrange those scripts so that they fail safe (as they all should anyway) then you'll have something which is a couple of orders of magnitude more reliable and scalable. It just needs a little more thought to begin with.

Well that was just an example. If on the other hand the system the images were pulled from was very busy then the same is true. The problem is that you can't architect for a moving target and the flexbility that rapidly changing environments require is something which ftsh would be quite useful for.

.. the shell just got the cool error-handling lisp has always had (condition-case in elisp, for example). From a lisper's perspectice, things will be so much easier now... and I can really try some more scripting..

Ofcourse bash can do it as well using the proper constructions. That is not the point. Care should be taken not to view bash as a golden hammer when it comes to shell scripting. The same goes for 'ftsh', ofcourse. It won't try to replace bash for every script out there.

The author merely thinks it would be nice to have a shell in which such fault-tolerant constructions are natural by design. Just to save people headaches when writing simple scripts which are there to get some job done, not to waste time dealing with every single possible failure and time-out by hand.

I think it still will promote bad programming/scripting practices. Many people ( including myself ) started with scripting before moving on to full-fledged programming. What they learned in scripting they carry forward with them into programming, and trust me, I learned to be very meticulous when it comes to interacting with things outside of my scripts control ( such as files ). Every I/O operation should be tested for success. Trying to open a file? Did it work? Ok, try writing to the file...did it work? Open a database connection...did it work? Let the user enter a number...did they enter a valid number? Error handling and input validation is something you just have to learn, like it or not. Something that holds your hand and lets you code while remaining oblivious to the realities of the scripting/programming environment is a bad thing IMHO.

On a side note for Perl, one thing I always hated were the examples that had something like "open( FH, "file/path" ) || die "Could not open file!" . $!; I mean, come one, you don't want your script to just quit if it encounters an error...how about putting in an example of error handling other than the script throwing up its hands and quitting! LOL.

Please excuse any grammatical/other typos above, I was on 4 hrs sleep when I wrote this. Thank You.

Done correctly, spellcheckers can be the best spelling-learning tool there is.

"Correctly" here means the spell-checkers that give you red underlines when you've finished typing the word and it's wrong. Right-clicking lets you see suggestions, add it to your personal dict, etc.

"Incorrectly" is when you have to run the spell-checker manually at the "end" of typing. That's when people lean on it.

The reason, of course, is feedback; feedback is absolutely vital to learning and spell-checkers that highlight are the only thing I know of that cuts the feedback loop down to zero seconds. Compared to this, spelling tests in school where the teacher hands back the test three days from now are a complete waste of time. (This is one of many places where out of the box thinking with computers would greatly improve the education process but nobody has the guts to say, "We need to stop 'testing' spelling and start using proper spell-checkers, and come up with some way to encourage kids to use words they don't necessarily know how to spell instead of punishing them." The primary use of computers in education is to cut the feedback loop down to no time at all. But I digress...)

'gaim' is pretty close but it really ticks me off how it always spellchecks a word immediately, so if you're typing along and you're going to send the word "unfortunately", but you've only typed as far as "unfortun", it highlights it as a misspelled word. Bad program! Wait until I've left the word!

I could more damage with "wile true; do... " in bash. It's not like an infinite loop is very hard. ftsh on the other hand defaults to exponential backoff, which isn't exactly what you want if you want to DOS someone.

So what you are saying is that programming should be hard, and people should be expected to do it right, or it promotes bad practices.

Yet we are expected to excuse your grammatical and typos. Doesn't that just promote bad practices? Shouldn't we whack you over the head with a baseball bat just to make sure you won't post when you're not prepared to write flawless posts?

The more work you have to do to check errors, the more likely it is that however vigilant you might be, errors slip past. If you have to check the return values of a 100 commands, that is a 100 chances for forgetting to do the check or for doing the check the wrong way, or for handling the error incorrectly.

In this case, the shell offers a function that provides a more sensible default handling of errors: If you don't handle them, the shell won't continue executing by "accident" because you didn't catch an error, but will terminate. It also provides an optional feature that let you easily retry commands that are likely to fail sometimes and where the likely error handling would be to stop processing and retry without having to write the logic yourself.

Each time you have to write logic to handle exponential backoff and to retry according to specific patterns is one more chance of introducing errors.

No offense, but I would rather trust a SINGLE implementation that I can hammer the hell out of until I trust it and reuse again and again than trust you (or anyone else) to check the return code of every command and every function they call.

This shell does not remove the responsibility to for handling errors. It a) chooses a default behaviour that reduces the chance of catastrophic errors when an unhandled error occurs, and b) provides a mechanism for automatic recovery from a class of errors that occur frequently in a particular type of systems (distributed systems where network problems DO happen on a regular basis), and by that leave developers free to spend their time on more sensible things (I'd rather have my team doing testing than writing more code than they need to)

If you can set up a distributed system at a reasonable cost where any program can continue to run without carying about an underlying failure, you would be richer than Bill Gates.

Resources DO become unavailable in most systems. It simply doesn't pay to ensure everything is duplicated, and set up infrastructures that makes it transparent to the end user - there are almost always cheaper ways of meeting your business goals by looking at what level of fault tolerance you actually need.

For most people hours, sometimes even days, of outages can be tolerable for many of their systems, and minutes mostly not noticeable if the tools can handle it. The cost difference in providing a system where unavailabilities are treated as a normal, acceptable condition within some parameters, and one where failures are made transparent to the user can be astronomical.

To this date, I have NEVER seen a computer system that would come close to the transparency you are suggesting, simply beause for most "normal" uses it doesn't make economic sense.

Spell checkers are *not* a substitute for knowing how words are spelled.

Of course not. But using a spell checker means having time to learn about the homonyms, instead of endlessly playing catch up.

You still predicated your post on "relying" on spell checkers; I'm saying that people learn from good spell checkers. That people can't learn everything from a spell checker is hardly a reason to throw the baby out with the bath water and insist that people use inferior learning techniques anyhow!

A kid that can spell accurately is in a much better position to learn about homonyms then one for whom spelling is still a mysterious, abstract art that involves guessing at things.

It's a well meaning idea, but it would cause more problems than it would solve. It would just encourage sloppy code; people would rationalize "I don't need to fix errors because it doesn't matter", which is a very bad habit to get into when programming, ignoring errors, or even warnings

You've got it backwards.

Most shell scripting is quick and dirty; no one checks error codes (mostly because it's a nuisance). FTSH makes it easier to check error codes, in part because the default behavior is the bail on any error. In essence FTSH adds exception-like behavior to shell programming. FTSH makes it easier to write shell programs that fail gracefully. How can that possibly be a bad thing?

What's with all of the people claiming that FTSH will ruin the world because it makes it easier to be a sloppy programmer. Did you freaking read the documentation?

To massively oversimplify, FTSH adds exceptions to shell scripting. Is that really so horrible? Is of line-after-line of "if [$? -eq 0] then" really an improvement? Welcome to the 1980's, we've discovered that programming languages should try and minimize the amount of time you spent typing the same thing over and over again. Human beings are bad at repetitive behavior, avoid repetition if you can.

Similarlly FTSH provides looping constructs to simplify the common case of "Try until it works, or until some timer or counter runs out." Less programmer time wasted coding Yet Another Loop, less opportunities for a stupid slip-up while coding that loop.

If you're so bothered by the possibility of people ignoring return codes it should please you to know that FTSH forces you to appreciate that return codes are very uncertain things. Did diff return 1 because the files are different, or because the linker failed to find a required library? Ultimately all you can say is that diff failed.

Christ, did C++ and Java get this sort of reaming early on? "How horrible, exceptions mean that you don't have to check return codes at every single level."