How to Construct an XML Site Map for Better SEO Results

A great way to make your Web site more user friendly to search engine spiders is to add an XML site map. Your XML site map should be constructed according to the current Sitemap Protocol format (which is regulated by www.sitemaps.org). Sitemap Protocol allows you to tell search engines about the URLs on your Web sites that should be crawled.

An XML site map is a document that uses the Sitemap Protocol and contains a list of the URLs for a site. The Protocol was written by the major search engines (Google, Yahoo!, and Live Search) to be highly scalable so that it can accommodate sites of any size. It also enables Webmasters to include additional information about each URL (when it was last updated, how often it changes, and how important it is in relation to other URLs in the site) so that search engines can more intelligently crawl the site. Note that even though its name is similar to the traditional HTML site map, an XML site map is a totally different kind of document, and the two are not interchangeable. You shouldn't rely on an XML site map alone for your site.

XML site maps define for the spider the importance and priority of the site, better enabling the search engine to index the entire site and to quickly re-index any site changes, site expansions, or site reductions. This XML format offers excellent site indexing and spider access. Additionally, many site-mapping tools can diagnose your XML site map, informing you of duplicate content, broken links, and areas that the spider can't access. Sitemaps.org has a tool that constructs an XML file for you, and is a great place to start.

Google adheres to Sitemap Protocol 0.9 as dictated by Sitemaps.org. Site maps created for Google using Sitemap Protocol 0.9 are therefore compatible with other search engines that adopt the standards of Sitemaps.org.

The below table shows both the required and optional tags in XML site maps.

Site Map Tags in XML

Tag

Required or Optional

Explanation

<urlset>

Required

Encapsulates the file and references the current protocol
standard.

<url>

Required

Parent tag for each URL entry. The remaining tags are children
of this tag.

<loc>

Required

URL of the page. This URL must begin with the protocol (such as
http) and end with a trailing slash, if your Web server requires
it. This value must be less than 2048 characters.

<lastmod>

Optional

The date of last modification of the file. This date should be
in W3C Datetime format. This format allows you to omit the time
portion, if desired, and use the YYYY-MM-DD format.

<changefreq>

Optional

How frequently the page is likely to change. This value
provides general information to search engines and may not
correlate exactly to how often they crawl the page.

<priority>

Optional

The priority of this URL relative to other URLs on your site.
Valid values range from 0.0 to 1.0. This value has no effect on
your pages compared to pages on other sites and only lets the
search engines know which of your pages you deem most important so
that they can order the crawl of your pages in the way you prefer.
The default priority of a page is 0.5. You should set your landing
pages at a higher priority and non-landing pages at a lower
one.