JSON-LD, the Google Knowledge Graph and schema.org SEO

Finally got your head around using RDFa or microdata for marking up HTML documents with schema.org? Prepare to to come to terms with a new protocol that will almost certainly become a vital item in the contemporary digital marketer's tool kit: JSON-LD.

Google today announced two types of information that will be integrated into their Knowledge Graph results.

In both cases the route to getting into the Knowledge Graph is by employing existing schema.org item types on official websites: ContactPoint nested within Organization for contact phone numbers, and MusicEvent for band concert dates.
These are well-established and well-understood schema.org types. And the appearance of band tour dates in the Knowledge Graph has received extensive coverage in search engine marketing circles – although the business contact numbers have flown a bit lower under the radar, as only the help article referenced above has so far been published (thanks Manu Sporny for the heads-up on that).

The bigger news here is that this is the first time that Google has officially endorsed JSON-LD as a way of providing schema.org information.

It's big news because JSON-LD is a significantly different method of providing structured data to search engines (and other data consumers).

RDFa and microdata – the only two methods of adding schema.org to a website previously sanctioned by Google – are both markup syntaxes. That is, they rely on adding schema.org information directly to the HTML code already present on a page.

JSON-LD (JavaScript Object Notation – Linked Data), by contrast, is an alternative to using HTML markup. JSON is "JSON-based format to serialize Linked Data," meaning it relies on JSON to provide that same schema.org information to data consumers.

So while RDFa and microdata require HTML, JSON-LD can be provided as islands embedded in HTML, or used directly with data-based web services and in application environments.

Here, for example, is some HTML code containing schema.org authorship information marked up with microdata:

As you can see, the JSON-LD – unlike the microdata – is entirely separate from the HTML code where the schema.org values are found, although at the end of the day the same property/value pairs are provided to Google with both protocols.

This represents both a challenge and opportunity for SEO. The challenge is keeping the JSON-LD data in sync with what appears on the page, as it is important for the search engines that the data you're providing to them (via JSON-LD) is the same as the data you're providing to humans (via HTML).

It's an opportunity insofar as SEOs are freed from including structured data within HTML documents. It could conceivably be provided directly in JSON-LD without HTML, or as <script>-encoded islands within documents that might be difficult to mark up (such as AJAX-based web pages).

With this nascent JSON-LD support also comes two new structured data markup testing tools, both of which accept, parse and provide feedback for JSON-LD code: one for musical events (the Events Markup Tester), and another for corporate contact information (the Corporate Contacts Markup Tester).

This is what the Events Markup Tester returns when the first block of example JSON-LD music event code from the Google Webmaster Tools help page on music events is run through it (complete with a helpful suggestion to include a ticket price):

For Google, this is a limited foray into JSON-LD support for schema.org. Aside from these two narrow categories of data, Google has not indicated that JSON-LD is a method of providing them with structured data that they'll respect. And in part, they are almost certainly using these initial integrations in order to test how well JSON-LD works in this context, and to what degree webmasters avail themselves of this (relatively recently developed) protocol.

However, it's a pretty clear sign that JSON-LD is going to loom larger on the SEO stage (it has already been embraced in a major way by semantic web community, and especially among developers in that community).

And – as a concluding aside – the integration of music events and contact phone numbers in the Knowledge Graph demonstrate, again, how being an early adopter of structured data technologies can pay off.

Always appreciate learning about the more technical aspects of the semantic web and how to integrate that into SEO strategies. I’m not a developer so I’m surely wrong but this seems even more complicated than using schema.org with microdata markup though.

Thanks for your comment Brandon, but I’m afraid I don’t know what “that” refers to, as I didn’t outline any procedure I was trying out, nor did I reference rich snippets. Can you be more specific? Thanks.

Well I guess what I’m trying to figure out is if I create schema using JSON-LD and implement the code via Google Tag Manager; will Google read that and place the rich snippets in their SERPs. I have did this for one of my clients about 4 weeks ago and nothing has appeared in the SERPs. So i was wondering if you I was doing something wrong. I used the Google structured data tool, and everything looks great, however like I mentioned its been a month and hasn’t appeared in SERPs yet.

First, there’s only a limited number of rich types for which JSON-LD will generate rich snippets in Google; they’re outlined here (summarized near the end of the post if you want to skip directly there).

Second, I’m not an expert in Google Tag Manager (or JavaScript), so I don’t know whether or not the Tag Manager-bound code you’re implementing is consumable by Google or not. Fortunately Simo Ahava is such an expert, and has written on this subject here.

In regard to verifying whether or not Google is correctly indexing your JSON-LD, you should be able to see the relevant data in the Webmasters Tools structured data report, as I’ve outlined here. Again, though, as per my first point, that Google acknowledges your JSON-LD-encoded data doesn’t (yet) mean that they’re going to use it to generate rich snippets (and even if a rich snippet type is officially acknowledged by Google to be supported by JSON-LD that doesn’t necessarily mean they’ll produce one, just as proper microdata- or RDFa-encoded schema.org that supports rich snippets doesn’t always actually result in a rich snippet being generated).

Very useful thanks Aaron, this is so much easier to achieve than integrating semantic markup in the code. The problem with that method has always been small errors creeping in, it is almost impossible to keep it straight on a large site where the CMS had semantic data grafted on after the event. Having the markup in one place makes it easy to check and test. Will get some corporate contacts set up and see how it goes.

I think JSON-LD will be excellent as a way to send structured schema.org data from one application to another via a web service.

Where it falls down for SEO and website performance is in the extra file size it adds to every page.

For example, if you have a 10,000 word article then you have all 10,000 words in the HTML. If you are using RDFa or Microdata with schema/Article then you simply put a property attribute of “articleBody” on the div surround the 10,000 words of article content (and the other properties on the title, author, etc). However if you are using JSON-LD for SEO then you would need to repeat all 10,000 words of the article body in the JSON as well as having them in the HTML for visitors to read. 10,000 words is around 60 KB, so when using JSON-LD in this manner your web page would be 60KB larger and therefore slower to load (especially on mobile devices using non-wifi connections).

So, for web pages that contain large amounts of text that need to be flagged as a structured data property, JSON-LD can cause significant performance problems on mobile devices due to increased page sizes. For smaller chunks of structured data it would be fine, and as a format in which to send data via a web service it’s brilliant. It’s a great new tool, when used appropriately.

Nothing more clearly brings into view the difference between marking up existing data, and the direct provision of property/value pairs.

I can perhaps see more value to a data consumer with the former than the latter. That is, I think one of the way Google benefits by having articleBody declared in markup is that it allows it to better understand document structure. In some ways it might be more important to Google to know where that data resides so it can access and parse it rather than the actual data itself, if that makes sense. (And as articleBody is a text type, as “data” it is perhaps less useful to Google than the article body in context, which includes non-textual elements like headings, images and their alt attributes, image and – critically – hyperlinks.)

To use an analogy, if there’s a 2MB image on a page, it obviously doesn’t increase the document size by 2MB to declare it with JSON-LD rather than in microdata or RDFa markup, because it’s only a URL – a pointer to the image rather than the image content itself. Thinking out loud it might be – at least for an HTML document – more useful for JSON-LD to refer to that page data an ID, rather than provide that directly in the JSON-LD. How? I’ve no idea, just thinking out loud.

And am still thinking: I may pose this as a question to some I know more knowledgeable about JSON-LD than I and report what they have to say.

I think, if these metadata is used only by search engines, there is no reason to expose it to user clients (e.g. mobile clients). We can use content negotiation to only expose Schema.org meta data to search engines..

Interesting analogy Tahir – but I think it is an analogy. A data layer (at least in the Tag Manager context) is basically a JavaScript variable, whereas JSON is the backbone technology for actually transmitting the data objects that consist of key:value pairs (and from that JSON-LD is a way of exchanging data in JSON). So I don’t think it’s only a difference in naming conventions: the data layer of tag management system and JSON (and JSON-LD) are different animals.

I landed here during my search for a better understanding of what Google might have in mind for a wider implementation of JSON-LD across more schema.org types.

I’m struggling with a large client site that grafts schema.org microdata markup into a web page that has many dynamically generated components, including stuff generated by a CMS-like system. It is difficult to get them to isolate the semantic markup, and particularly to get complex nesting organized properly. I find myself in the awkward position of parsing the markup into JSON to understand and validate it, then translating it back into a skeleton schema.org microdata structure that represents the target of what I want the final rendered page to look like.

Then the developers have to manipulate the page production process to generate the target structure. The whole process is sluggish, indirect, and very hard to iterate on. Embedding JSON-LD directly as islands in the HTML would provide a far superior and direct path.

Of course we may be trading one demon for another, having to then ensure that the generated markup does in fact match the embedded JSON. I’m sure one of the issues that is holding Google up is a concern over structured data spamming! While our intentions would be to map the structured data one to one with the markup, this disconnection does seem to open the door for spamming if Google doesn’t carefully match up the two. And aside from that, it means that we would have to maintain both sets of data in parallel.

Anyway, those objections (which seem solvable) aside, I am excited by the prospect of using JSON-LD as a direct way to communicate structure on web pages. Thanks for following this development and providing some visibility into where things are headed. I’ll be keeping an eye on seoskeptic for more on this topic!