Robots.txt question

I hope I'm allowed to ask this question here.
I want to set-up a robots.txt file to manage which BOT's and webcrawlers can access which parts of my website.
To do so, you have to specify the "User Agent" for each BOT.
Where can I find a list of valid User Agent names for the most common BOT's?
For example if I want to specify Google, do I specify "google" or "googlebot" or " google image" etc etc. ?
I really want to exclude all BOT's other than Google, Yahoo and maybe MSN but I don't know the entire list of "User Agents" I have to specify to achieve this.
TIA

I really want to exclude all BOT's other than Google, Yahoo and maybe MSN but I don't know the entire list of "User Agents" I have to specify to achieve this.

Click to expand...

Based on this comment, robots.txt is not appropriate for what you are trying to do and in fact may have unexpected consequences

Unlike .htaccess, the robot.txt file is not physically enforced but is instead a voluntary system that web crawling spiders can choose to read or not read or obey or not obey at their own discretion and choosing.

The major players will of course read your robots.txt and will obey your requests but the minor crawlers, foreign bots, and even more so the spam harvesting bots DO NOT obey any requests you make in your robots.txt.

In fact, most of these web crawling bots will actually read your robots.txt to find content to crawl and instead of skipping areas you ask not to crawl will instead make a beeline path straight for those folders!

This incidentally is also the reason why you NEVER put the links to your "admin" type areas for web scripts in your robots.txt as it will tell all the web spiders, hackers, and everyone else precisely where to go for that.

If you want to limit certain web spiders, I would recommend instead using access directives (IE: deny from) in .htaccess which can be easily be written to parse against any Apache web variable including user agent.

Another good idea is blocking the IP ranges of the known ones you want to block in your firewall as that also frees up resources for your server with not have to wait until the requests complete to block.