microformats.org at 5: Two Billion Pages With hCards, 94% of Rich Snippets

The microformats.org community recently celebrated its 5th birthday – five plus years of openly researching, creating, and iterating on web standards to express common semantics designed for humans first, machines second.

Originally brainstormed in September 2004, and rapidly adopted by numerous tools, sites, large and small, the number of pages published with one or more hCards recently crossed the 2 billion mark a few days ago according to Yahoo Search Monkey, making it the most popular format for people or organizations on the web:

Search Monkey’s results do tend to fluctuate a few percentage points, even hour by hour, so you may see different numbers, both lower, and over time, higher and higher. Here are a few recent hCard deployments that no doubt contributed to crossing the two billion mark:

Finally, just before microformats.org’s 5th birthday on this past June 20th, developers of BrightKite informed us that they’ve fully implemented hCard on all of their 5.5 million registered user profiles and 16.5 million venue pages – another 22 million new hCards. Thanks for the birthday present BrightKite!

All of these deployments come from the powerful combination of: 1. microformats ease-of-authoring (the easiest way to semantically markup people, venues, etc. in HTML), and 2. the fact that search engines like Yahoo and Google index microformats and make them visible in their user interfaces.

In May of 2009 Google launched Rich Snippets with support for microformats and RDFa, with a set of content partners like Yelp who all chose to use microformats to produce rich snippets in Google search results.

For all of these, Google provided side-by-side examples for each snippet type in multiple formats (microformats, RDFa, microdata), which in many ways has helped to demonstrate how much simpler/easier microformats are in many respects (and some of the promise that microdata shows for more general extensibility).

As recently reported by ReadWriteWeb, Google themselves reported at the Semantic Technologies conference that when Google finds data for rich snippets on pages, 94% of the time that data for rich snippets is marked up with microformats (40,091 vs. 2,514, conservatively assuming none of of those pages contain both, if they did, the 94% number would be even higher).

This is no surprise, as The State of Web Development 2010 survey showed nearly an order of magnitude gap, that is far more (6x more) web developers use microformats in their day to day work (34.52% use microformats vs 5.63% use RDFa, per the survey).

Given many more web developers are using microformats, it’s not surprising that Google is finding more microformats than alternatives. What is interesting though is that while 6x more developers use microformats, Google is finding 16x more microformats for rich snippets than alternatives.

One could conclude from these two numbers that developers using microformats are 2-3 times more net productive in terms of number of pages produced with rich snippets. This net productivity could be because microformats are easier (take less time) to author, and possibly that microformats are easier to get right, and thus have Google recognize them, as compared to alternatives.

Still, we can do even better than that. And no, I’m not just talking about going from 94% to 99+%.

The Google presentation slide noted that the results were out of one million web pages sampled from the Internet. Out of that, only ~40,000 had microformats. Given that nearly every web page mentions people, organizations, events, or some other popular microformat, that number should be much higher.

Thus there is much room for us to improve, and in particular, based on feedback, from Google, Yahoo, from numerous smaller companies and independent web developers, we can and should make microformats even simpler. Simpler to write, easier to get right, and ideally, even more micro – less code, less page weight. Starting with a few ideas brainstormed a couple of months ago, there’s now a few folks working on a “microformats 2.0″ to achieve these goals.

Do you have feedback or ideas about how microformats could be made even simpler and easier for authors?

Thanks to all of the hard work and contributions by everyone in the microformats community for an excellent fifth year of microformats.org. Here’s looking forward to even more microformats accomplishments in our sixth year.

Hi Ricardo, the location class we’re using isn’t in the spec, but that’s because our location field is a very freeform field, so we didn’t feel it matched any of the available options within hCard. If there’s something better that we could use, we’re all ears ;)

The Google presentation slide noted that the results were out of “one million web pages sampled from the Internet”. Out of that, only ~40,000 had microformats. Given that nearly every web page mentions people, organizations, events, or some other popular microformat, that number should be much higher.