Good starting point. Who do maintain the db? Up to when the data are updated?
–
Alberto De CaroAug 7 '12 at 8:25

@ADC There are timestamps in at least two locations on that page, although they're not terribly recent. But it's one resource to have around. The forum(feedback & additions) seems to still have some people making submissions, even if they may not be getting published to the site.
–
Su'Aug 7 '12 at 9:50

It's highly unlikely you're going to find some completely universal list of UserAgents, in part because they can just be made up. Before even getting to that, though, it'd be a ridiculous amount of work. You just need to compile a few resources and then do some further searching for anything else you don't recognize. (Surprisingly, I can't find a Wikipedia "List of…" article for this.)

Here's a massive list of nothing but iOS UA strings. If you look at how fast some of those get changed in the date column and take into account the last update to the document was 10 weeks ago, it's quite possibly already missing something.

The Apple devices do share a common string being Apple-iPhone and Apple-iPod etc. I don't think looking for all bots in raw access logs would work either especially since there's bound to be a few malicious bots using standard browser user agents. A quick way to get a count would be search the log files for keywords such as bot, bots, robot, crawler, spider looking for all legitimate bots who link to web pages explaining how they crawl. Or search for robots.txt i'm sure most bots would look at it even malicious ones to see where you don't want them to crawl.
–
AnagioAug 7 '12 at 10:54

This week our company (Incapsula) launched Botopedia.org - a Community-Sourced bot directory. It's 100% free and open for all and you can use it to find a complete user-agent list for all bots you`ll want to look up.

As for indentification methods, I want to refer you to this discussion in Security.Stackexchange which covers different methods of bot identification (i.e. JS challenge, Method check, robot.txt access and more).