There are reasons I don't want to use robots.txt, the site in question adds new sections/channels on a daily basis, many of which we don't want spidered, some we do, whilst the URL structure is such that even with using say the Google extensions to robots.txt we'd end up with a very, very large robots.txt file which would be unmanageable on a daily basis. With this we can use the CMS (Content Management System) to control the state of meta tags when the pages are created. Well it sounds like a plan anyway...

I see that at some point the W3C discussed putting user agents in the meta tag standard but didn't...

Most well behaved bots will honor robots.txt and some bots/SEs (including Googlebot) will honor "robots" meta tags. I usually use both to control access to specific pages but I build pages with all SEs in mind.

A very interesting effect with Google and AJ (and maybe others) is that they will list a link to a page, even if it is disallowed in robots.txt. The link is listed with no title and no page description - as you might expect, since you have told the robot not to fetch the page. However, if Googlebot or AJ find a link anywhere on the web, they will list the link in their SERPs if it is sufficiently relevant to the search terms.

The only way I have found to tell Gbot and AJ, "Please don't mention this URL at all" is to not disallow the page in robots.txt, but rather disallow it only using the on-page robots meta tag. It's the only way I've found to make "semi-private pages" stay that way.