Insider information on Semantic SEO, schema.org, and GoodRelations

Google has just released a new developer recipe for paywalled content.

In essence, the new markup allows publishers to mark-up page components that require registration or a subscription for access, on the basis of CSS .class selectors.

This is an interesting new usage of schema.org markup. What I find the most remarkable is the following:

Google explicitly says that JSON-LD is the only supported syntax. This is additional support for my prediction that Microdata and RDFa are becoming more and more obsolete, except for the most simplistic of use cases of structured data.

This is the first time schema.org markup is used to steer search engine access to Web content. This is a big step, because in the past, it has been important for Google to promote robots.txt as the one and only directives for what one is allowed to do with your content, basically an all-or-nothing approach: If you are not happy with what Google did with your content, your only choice was using robots.txt to remove it from the index. See here for a few links: http://www.robotstxt.org/faq/legal.html.

My point #2 is in fact a bit too strongly made: In fact, Google has already taken steps to come to more granular agreements with content providers, namely news sites, on what Google is allowed to do with their content:

Below, I will explain how you can properly model a product detail page with multiple product variants that have different prices in a way that almost certainly triggers Google Rich Snippets for products.

The main challenge is to get a price range shown in a Rich Snippet for a page with multiple product variants, like so:

Before we start, it is important to understand that, to my knowledge, Google has previously shown price ranges in Rich Snippets for Products only in one of the following three cases:

Each entity should be marked up using the relevant schema.org type, such as schema.org/Product for product category pages. Marking up just one category entity from all listed on the page is against our guidelines.

It essentially meant that you could mark-up only deep detail pages that describe exactly one single product. Today, we found first evidence that Google is now properly summarizing multiple product descriptions to generate a summary. If you search for “New Volkswagen Golf Yorkville New York”, you might get the following search result:

If you look at the first result, you see a price range ($21,515.00 to $30,830.00):

While we have not yet seen it in the wild, we expect that Google will also try to create Rich Snippets for single products from multi-item pages depending on the user query. So instead of summarizing a multi-product page into one aggregate Rich Snippet, they might chose a single entity that they think fits your information needs best.

In short: We see that we can now safely mark-up multiple products in a page and expect Rich Snippets for price information. Google is able to summarize them. That is good news, in particular for automotive Web sites.

PS: If you are the person responsible for adding the schema.org markup to http://www.steetpontevolkswagen.net/, I would first like to congratulate to the pretty advanced and effective use of schema.org. Second, I would recommend that you add the new features for http://schema.org/Car for standardized car features and to use http://schema.org/additionalProperty for the other car features.. Need help? Contact us!

Many Web developers and SEO experts are interested in schema.org markup because they want to trigger Google Rich Snippets, i.e. the augmentation of search results by stars for ratings, price and inventory level information and more. This is a valid interest, and while we still lack quantitative evidence, it is pretty obvious that additional information (like the price of a product), in particular positive signals (five-star ratings or availability on stock) can increase the likelihood of a click.

Note that this will have multiple effects for search engine performance of a site:

First of all, it will lead to more traffic to your site, if people are more likely to click on on a result with Rich Snippet features than on others. This could either strengthen the appeal of the already prominent results on rank 1 – 3, or as well bring back into the game the otherwise unfortunate lower ranks in the results. For instance, if a result on rank 7 is the only one with Rich Snippet features, that one could become more attractive for search engine users than a result on that rank without any special appearance.

Second, Rich Snippets could reenforce the already good reputation of high-ranked sites. If Rich Snippets make visitors find faster what they need and direct them to the best match, all signals that measure user satisfaction with Google’s ranking will send positive feedback to the ranking ecosystem.

Third, Rich Snippets can send more qualified traffic to your site, because search engine users make a more informed decision on whether to visit your site.

So far, things are simple and positive. If you add schema.org markup properly and if your remaining SEO strategies are white-hat and fair, chances are high that you get Rich Snippets for a substantial subset of your pages.

But what will happen in the future? Well, be prepared that Rich Snippets will turn from something that you “control” to something that will be highly dynamic. Do not take it for granted that, in the future, Google will show Rich Snippets for all of your deep links to all visitors and queries alike, and that Google would activate all Rich Snippet features (e.g. price, inventory level, ratings, reviews, …) in the same manner.

Instead, I expect that the more structured data we find in Web sites, the more will Google selectively use Rich Snippets and future similar features to highlight small, significant details in search results. Same as with a text-marker, you do not highlight every single word even in the best of books, but only those words or paragraphs that matter the most. There is simply no gain for the user in showing fully-fledged Rich Snippets for ten of ten relevant search results.

So what will Google do? Here is my opinion:

First, they will scientifically monitor the ideal (from a user’s perspective) number of results with Rich Snippets (I guess it will be between three and four of ten items in the results).

Second, they will try to understand which pieces of information will benefit most from highlighting with Rich Snippets. For instance, if only two of ten sites indicate availability of an item, that will be more useful to show in a Rich Snippet as compared to a list of results with all or none showing positive inventory.

Third, I suspect that Google will use Rich Snippets to highlight diversity in the search results. Even as of today, it is a known topic in recommender systems research that the controlled inclusion of variety (e.g. a few less matching items) can improve user satisfaction. A typical example is a query where the search engine is uncertain about the exact meaning of the query (the “query intent”). So we can assume that Google includes a bit of less relevant, slightly off-topic results in a search even as of now, and Rich Snippets could be used to highlight this variety. Take, for example, in a list of products the only one with a price grossly higher than the others (e.g. the only professional camera in a list of amateur cameras). Google could turn on price information just for this one to point you to this more expensive item on rank 7.

I think we should understand that Rich Snippets and structured data markup are just one of many means for improving the human user experience in search, and I bet Google will use Rich Snippets and future visual features very cleverly for improving the interface between human minds and information on the Web.

This does not mean that schema.org or Rich Snippets are overrated. Quite to the contrary, they fundamentally transform how companies communicate with target audiences about their products and services. I just want you to be prepared that an effective online strategy for using structured markup does not end at asking a junior Web developer “to add a few schema.org elements in Microdata syntax.”. The whole infrastructure of digital marketing is changing.

Often, developers being new to schema.org and the usage of semantic SEO techniques are confused about the relationship between schema.org and Microdata, Microformats, RDFa, GoodRelations, and other standards.

Here is a quick explanation that I have been given so often that I assume it may be useful for others:

When you expose structured data from within Web content by adding extra markup to HTML content, you have essentially two components:

1. A vocabulary (also known as data schema, ontology, data dictionary, depending on the background of the people you speak to): This provides global identifiers for types of things (“Product”, “Car”, “Restaurant” – often called “classes” or “types”) and for properties (e.g. “screen size”, “weight” – often called properties or attributes)

2. A syntax for publishing the data within Web pages in HTML. The syntax is the convention for the actual characters used to publish a piece of data. Relevant syntaxes in here are RDFa, Microdata, and recently JSON/JSON-LD.

Popular vocabularies on the Web are schema.org, GoodRelations, FOAF, SIOC, and a few others.

At Web scale, the absolutely dominant vocabulary for mainstream search engines is schema.org. GoodRelations is a special case, since 99% of the GoodRelations vocabulary are now integrated in schema.org, so you do not have to choose between the two. In other words, schema.org is now a new namespace for using GoodRelations. Additional vocabularies may have relevance on the long tail and can typically be used in addition to schema.org with no negative effect. Once they will have gained sufficient popularity, search engines may care.

Now, you can use the same vocabulary in multiple syntaxes. For instance, you can publish schema.org in RDFa or Microdata or JSON/JSON-LD. The most appropriate syntax depends on the purpose and on the target applications of your data. In Web content, Microdata and RDFa should be equally well supported by search engines in theory. However, actual support varies.

As of now, I would recommend the following:

1. Microdata syntax for schema.org. RDFa works, but not all structural variants of the same data will be understood by search engines and you need to be a real expert to find out which ones work and which ones don’t.

2. RDFa for GoodRelations in the original namespace, since for historic reasons, search engines know well how to process it.

Microformats are a special case, since they combine syntax and vocabulary. For very simple data structures, this works well, and Microformats are widely understood by search engines. It is just my personal opinion, and I am sure advocates of Microformats will see things differently, but in the light of schema.org and generic syntaxes like Microdata, RDFa, and JSON-LD, Microformats will be limited to very basic usages, and likely fade out.

So in a nutshell, schema.org in Microdata is currently the most widely understood and recommended variant.

schema.org in RDFa and in particular JSON-LD may become more important in the future, but you will have to monitor closely to which degree search engines can actually process data in those syntaxes.

For quite a while, I have been arguing that RDFa as a syntax for structured data in Web content is problematic when it comes to exposing more granular data than just a few property names. While many advocates of RDFa stressed that reusing the exact same visible elements for structured data markup, as in this example

<body>
<div property="vcard:tel">+49-89-1234-0</div>
</body>

was beneficial because it reduces redundancy, it also raises complexity for developers, since you violate the principle of “separation of concerns” – you have to align a given HTML tree structure with a given data structure, dictated by the vocabulary, like schema.org or GoodRelations.

As a consequence, I once developed and promoted the “RDFa in Snippets Style” approach, where the RDFa markup would reside in blocks of invisible <div> or <span> elements, like this:

Now, one caveat has always been that Google indicated that invisible markup, i.e. RDFa elements that do not reuse visible content, would not be honored. The likely rationale for that guideline was that

Now, in silence, RDFa in “Snippet Style” (and similar patterns in Microdata) have for long been accepted by Google, as long as other quality indicators for the site were positive. But there was always a doubt, which was bad, since the development effort for weaving in advanced data markup in RDFa or Microdata syntax into HTML templates in a form that combined visible content elements with data markup was, in my experience, 5 – 10 times higher as compared to using RDFa in “Snippet Style”.

Of course, this is just a first signal, but I personally think that in the future, we will see JSON-LD in script elements for all advanced data markup, and RDFa and Microdata only for the very simple use-cases.

That is a good sign towards a broader use of data markup for e-commerce, for sure.

Today, we finally released the complete update of our corporate Web site at http://www.heppresearch.com. While our main line of business is individual consulting with a selected number of clients, we thought it was high time to update our face to the public to the 2012 state of the art of Web development. After all, we claim to know how to do marketing in the next generation of the WWW, so we should show that we easily master the current one.

A key goal of the release was to support responsive design so that the page would work beautifully on any device, from a smartphone to my 30 inch screen in the office, and to migrate to HTML5.

I hope you like our new design as much as I do. Any feedback will be very much appreciated.

Post navigation

About this blog

This is the blog of Hepp Research, the data marketing consulting business founded by Prof. Martin Hepp, the inventor of the GoodRelations vocabulary for e-commerce.
Prof. Hepp and his team use this blog to share the latest news and insights of using GoodRelations, schema.org, RDFa, and Microdata for Semantic SEO and other purposes.
+Martin HeppImprint