Providing Structured Data

This page shows you how to add the structured data that search operators depend on.

Web pages are often filled with free form text, which is easy for
humans to read but more difficult for computers to understand. Some
web pages have information with greater structure that is easy to
read, such as a page date embedded in the URL or title of the page,
or machine-readable fields embedded in the HTML code. Google
extracts a variety of structured data from web pages. This
page describes the structured data types Google extracts that are
available for use in
Custom Snippets and
Structured Search.

Overview

When you are reading a webpage that sells a DVD, you can quickly
figure out what the title is, what reviewers thought of the film, and
how they rated it. But a computer cannot do the same things, because
it doesn't understand how the information is structured.

For example, if the page has content about the DVD—along with
recommendations for other items, ads from other stores, and comments from
customers—then the page might have different prices for various
things, not just for the DVD that is being sold. You can easily figure
out the price for the DVD while dismissing the other prices, but the
computer can't. Some sophisticated programs might find the prices in
the webpage, but they cannot determine the rules for finding just the
price of the DVD.

Structured data formats are rules that standardize the structure and
content of the webpage. They are markup that you apply to text snippets so that
computers can process their meaning or semantics. The markup does not change the
formatting of your website, it just makes the metadata and text enclosed within
the XHTML tags more meaningful to computers.

<meta> tags: standard HTML tags, a subset of which
are parsed by Google.

Page Date: features on a page indicating its
date, which Google attempts to parse

You can use one or a combination of formats that you prefer.
Note that unlike Custom Search, Google Search does not use PageMaps or
<meta> tags when generating rich snippets. Google Search does consider
information such as microformats, microdata, RDFa, and the page date
when it is generating snippet, but it has its own algorithm and policies
for determining what information gets shown to users. So while structured
data you add to your pages can be presented on Custom Search, it might not
be displayed in Google Search results.

The following includes an idealized snippet of plain HTML from a review site:

By incorporating standard structured data formats into your
webpages, you not only make the data available to Custom Search, but
also for any service or tool that supports the same standard. Apply
structured data to the most important information in the webpage, so
you can present them directly in the results. For example, if you
have a website selling Android devices, include structured data about
the ratings, prices, availability, and whatnot. When your users search
for the Android devices, they can see the ratings, prices, and
availability at a glance.

So computers can now understand the
types of data in the webpage. Now what? Computers can also start doing
the menial task of finding and combining information in different
webpages. This frees users from totally boring tasks, such as sifting
through multiple pages to find items that they want. Search engines,
such as Custom Search, can process the structured data in your
webpages and display it in useful, more meaningful ways, such as
custom snippets and
structured search.

Providing Data to Custom Search

Google supports several kinds of data which are used primarily by
Custom Search: Pagemaps, a subset of <meta> tags, and approximate page dates.

Using PageMaps

PageMaps is a structured data format that provides Google with information about
the data on a page. It enables website creators to embed data and notes in
webpages. Although the structured data is not visible to your users or to
Google Web Search, Custom Search recognizes it when indexing your webpages and
returns it directly in XML results or in JSON format in the
Custom Search element.

You can explicitly add PageMaps to a page, or submit PageMaps using a Sitemap.
Google will also use other information on a page, such as rich snippets markup or
meta tag data, to create a PageMap.

Unlike the other structured data formats described below, PageMaps does
not require you to follow standard properties or terms, or even refer
to an existing vocabulary, schema, or template. You can just create
custom attribute values that make sense for your website. Unlike the structured
data attributes of microformats, microdata and RDFa, which are added around
user-visible content in the body of the HTML, PageMaps metadata is included in
the head section of the HTML page. This method supports arbitrary
data which may be needed by your application but which you might not want to
display to users. (If you don't want PageMap information returned in your XML,
you can keep it private using an AccessKey.)

Once you create a PageMap, you can submit it to Google using any of the
following methods:

PageMap tag definitions

The following table outlines the requirements for adding PageMap data to a
Sitemap.

Tag

Required?

Description

PageMap

Yes

Encloses all PageMap information for the relevant URL.

DataObject

Yes

Encloses all information about a single element (for example, an
action).

Attribute

Yes

Each DataObject contains one or more attributes.

Note:
PageMaps are XML blocks and therefore must be formatted correctly;
in particular, the PageMap, DataObject and
Attribute tags in the XML are case sensitive, as are the
type, name, and value attributes.

Submit PageMap data using the Custom Search Control API

To submit PageMap data using the Custom
Search Control API, send an HTTP POST message, using the
text/xml content type, to: http://www.google.com/cse/api/default/index/<CSE_ID>.
Include PageMap data in the message body, like this:

Private PageMaps

In some cases, you may not want custom attributes returned in your search
engine's query results XML, because those are publicly visible by default. In
this case, you can create a private PageMap by adding an AccessKey to
the DataObject you want to protect, and sending the PageMap directly to Google
using the on-demand indexing API. Only web
searches with a matching AccessKey parameter will get that
DataObject in results.

Parsing PageMap data

If you are getting results back via XML, then the custom attributes
are returned in the results within the PageMap tag, as shown
below. You can parse the DataObjects within the PageMap tag and
provide customized presentation of the relevant attributes.
If you are using the Custom Search element, then the custom attributes are
returned in the richSnippet property of each result for use
in data templates, as described at
Rich Snippet result properties.

Using <meta> tags

While PageMaps allow you to precisely specify the data you want for
each page, sometimes you have a large amount of content which you do
not want to annotate. Google extracts selected content from
META
tags of the form <meta name="KEY"
content="VALUE">. We do not support variants of the
META tag, such as the use of property instead of name.

While we explicitly exclude common
tags that are usually inserted programmatically by web authoring tools,
such as robots, description, and
keywords, rarer tags specific to your site will be
extracted and put into a special data object
of type metatags, which can be used with all of Custom
Search's structured data features. For example, a <meta> tag of the form:

<meta name="pubdate" content="20100101">

creates a PageMap DataObject which is returned in XML results like this:

The data in this automatically created PageMap can be used anywhere you can
use data from a PageMap explicitly included in your page's content. For
instance, it can be used with structured search operators like
Sort by Attribute:

Google attempts to include all other <meta> tags, with the caveat that
punctuation, special characters and embedded spaces in the name
field of <meta> tags may not be parsed correctly. Custom Search
explicitly supports periods and dashes in <meta> tag names.
Custom Search does not explicitly support other special characters
within <meta> tag names, but some special characters
may be accepted correctly if they are
URL encoded.

Limitations

Custom Search will convert up to 50 <meta> tags to PageMaps, as long
as the total text size of all processed properties does not exceed 1MB, with no
individual property exceeding 1024 characters.

Using Page Dates

In addition to metadata which you explicitly specify on a page,
Google also estimates a page date based on features of the page such
as dates in the title and URL. Custom Search allows you to use this
date to sort, bias and range restrict results by using a special metadata
key of date. This estimated date can be used in all operators
that use the &sort= URL parameter, including
Sort by Attribute,
Bias by Attribute,
Restrict to Range.

Note: The page date is not added to the PageMap,
so it is not returned in XML results, cannot be used in the Custom Search
element, and cannot be used with the
Filter by Attribute feature.

The following examples show the use of the page date with these operators:

Google's estimate of the right date for a page is based on features
such as the byline date of news articles or an explicitly specified
date in the title of the document. If a page has poorly specified or
inconsistent dates Google's estimate of the page date may not make
sense, and your custom search engine may return results ordered in
a way you do not expect.

Formatting Dates

A site may provide date information implicitly, relying on Google's
estimated page date feature to detect dates embedded in the page
URL, title or other features, or explicitly, by supplying a date in
a structured data format. In either case, effective use of dates
requires formatting the dates correctly.

Google will attempt to parse variants of these date formats, such
as MM/DD/YYYY and DD/MM/YYYY. However,
the more ambiguous the date, the less likely that Google will parse
it correctly. For example, the date 06/07/08 is
extremely ambiguous and it is unlikely Google will assign to it
the interpretation you want. For best results, use a complete
ISO 8601
date format with a fully specified year.

Rich Snippets

Google also extracts a variety of structured data from Microformats, RDFa
and Microdata to be used in
Rich Snippets, extended presentations of standard Google search results.
A subset of this data is available for use in Custom Search's
structured data operators—typically, the same data used in Rich Snippets.
For example, if you have marked up your pages with the Microformat
hrecipe standard, you could sort on the number of rating
stars of the recipe with an operator like
&sort=recipe-ratingstars.
Google is continually extending the data it extracts and how much of this
data is available for use in Custom Search; to see what data we currently
extract, you can use the
Rich Snippets Preview Tool in Webmaster tools.

Using Microformats

Microformats
is a specification for representing commonly published
items such as reviews, people, products, and businesses. Generally,
microformats consist of <span> and
<div> elements and a class property, along with a
brief and descriptive property name (such as dtreviewed
or rating, which represent the date an item was reviewed
and its rating, respectively).

To see what Google extracts for a page, use the
Rich
Snippets Testing Tool in Google's
Webmaster Tools site. The data Google extracts from pages is
continually being extended, so check back periodically to see if
the data you want has been made available. In the meantime, if you need
custom data that does not correspond to a defined microformat,
you can use PageMaps.

Using Resource Description Framework in Attributes (RDFa)

Resource Description Framework in attributes (RDFa) is more flexible
than microformats. Microformats specify both a syntax for including
structured data into HTML documents and set of microformat classes
each with its own specific vocabulary of allowed attributes. RDFa, on
the other hand, specifies only a syntax and allows you to use existing
vocabularies of attributes or create your own. It even lets you combine
multiple vocabularies freely. If the existing vocabularies do not meet
your needs, you can define your own standards and vocabularies by
creating new fields.

Using Microdata

HTML5, the latest revision of the language web pages are written in,
defines a format called
microdata
that incorporates the ideas of RDFa and Microformats directly into the
HTML standard itself. Microdata uses simple attributes in HTML tags
(often span or div) to assign brief and
descriptive names to items and properties.

Like RDFa and Microformats, Microdata's attributes help you specify that
your content describes information of specific types, like reviews,
people, information or events. For example, an person can have the
properties name, nickname, url, title and affiliation. The following is
an example of a short HTML block showing this basic contact
information for Bob Smith:

<div>
My name is Bob Smith but people call me Smithy. Here is my home page:
<a href="http://www.example.com">www.example.com</a>
I live in Albuquerque, NM and work as an engineer at ACME Corp.
</div>

The following is the same HTML marked up with microdata. Note that in this example
we use a property 'nickname' that is not yet officially part of schema.org. Custom
Search is a good way to explore possible schema.org extensions locally before
proposing them to the wider community.

<div itemscope itemtype="http://schema.org/Person">
My name is <span itemprop="name">Bob Smith</span>
but people call me <span itemprop="nickname">Smithy</span>.
Here is my home page:
<a href="http://www.example.com" itemprop="url">www.example.com</a>
I live in Albuquerque, NM and work as an <span itemprop="title">engineer</span>
at <span itemprop="affiliation">ACME Corp</span>.
</div>

The first line of this example includes a HTML div tag with
an itemscope attribute that indicates that div
contains a microdata item. The
itemtype="http://schema.org/Person" attribute on
the same tage tells us this is a person. Each property of the person item
is identified with the itemprop attribute; for example,
itemprop="name" on the span tag describes
the person's name. Note that you are not limited to span
and div; the itemprop="url" tag is attached
to an a (anchor) tag.

Viewing Extracted Structured Data

After you have tagged your webpages with structured data, you can use
the Rich
Snippets Testing Tool to view the structured data that can be
extracted from the webpage. The tool provides two views: the first
view shows the structured data that Google Search can extract from the
page, while the second view shows what Custom Search can extract from
the page.

If you haven't tagged any of your webpages but would like to see what
extracted structured data might look like, you can enter the URL of
other websites. Popular sites that have review information or list of
contacts are more likely to have structured data. If you see result
snippets on Google search that looks similar to Figure 1, you can
conclude that the webpage has structured data.

Figure 1: Result snippet with rating, price range, and review.

Once you have found a page with structured data, you can view that
page's source to see the structured data that
site has implmented, or view that page in the Rich Snippets
Testing Tool to see what data is extracted for Google Search
rich snippets and Custom Search structured search. For example,
consider the following snippet of HTML with structured data about
a person implemented as microformats:

Custom Search extracts the following subset of that data
for use in structured search:

person (source = MICROFORMAT)
location = Tokyo

Thus, this tool allows you to view not only the Rich Snippets markup
recognized for Google Search, but also the additional customized
markup that we support in Custom Search. You can immediately see how
your web page would be processed during indexing, and what metadata
attributes would be returned in PageMaps in your Custom Search results.
If there are any errors in your markup, you can fix them right away.
Remember, you need to add the &view=cse parameter to the
URL or click the checkbox to review the additional metadata
extracted by Custom Search.