Tutorial - Creating a Search Engine XML Site Map

Overview

Adding an XML sitemap to your site makes it easier for search engine's such as Google to find and index your site pages. Your friendly SEO consultancy will recommend you have a google site map for 'better SEO'.

There isn't an 'out of the box' XML sitemap generator with Umbraco, this tutorial will show you how to create one - but if you are in a hurry, there are some Umbraco Packages that will quickly do the job for you:

The XML Sitemap is a guide for search engines to discover and index your content, it doesn't need to be exhaustive, each page on your site that you wish to feature will be represented by a <url> entry in the list.

Approach

There are many ways of approaching this task, the best approach will be determined by the size of your site, and your preference for implementing functionality in Umbraco, for simplicity sake we're going to write this code directly in a template using Razor and IPublishedContent, but you may want to use route hijacking to write the code in an MVC controller or XSLT is still a really good fit for this kind of task.

We'll create a new Document Type called 'XmlSiteMap' with corresponding 'XmlSiteMap' template (visiting this page will trigger the rendering of the XML Sitemap).

The XmlSiteMap document type will contain a 'Blacklisted Document Types' property to the XmlSiteMap Document Type to list types of content we wish to exclude from the Site Map (or you could probably take a 'whitelist' approach if it is easier to specify types that should be included rather than define those that will be excluded.)

We'll create a 'SiteMap' Composition, containing a consistent set of 'Site Map related properties, and we'll add this to all of the different document types of the site.

The 'SiteMap' Composition will contain a 'hide from Xml Site Map' checkbox, to give editors the ability to hide a certain page from the XML Sitemap.

The implementation will start at the homepage of the site and loop through all the children, iterating in turn through the children of the children, etc, checking at each level whether to continue further based on the properties of the page.

1. Create the XmlSiteMap Document Type

The act of creating the document type, unless you specify otherwise will also create a corresponding template called 'XmlSiteMap'.

Visit your homepage document type, and choose 'Permissions', allow the new XmlSiteMap document type to be created underneath the homepage.

Create your XmlSiteMap page in your content tree.

and add the xmlSiteMap document type to your 'Blacklist'.

2. Create XmlSiteMapSettings Composition

A site map entry will allow you to state the relative priority of any particular page in terms of its importance within your site, where a value of 1.0 is super very important, and 0.1 close to insignificant, you can also state 'how often' the content will change on a particular page, eg weekly, monthly etc, and this will help the search engine know when to return to reindex any regularly updated content.

Search Engine Relative Priority - Slider - MinValue: 0.1, MaxValue: 1, Step Increments 0.1, InitialValue 0.5
(Relative priority of this page between 0.1 and 1.0, where 1.0 is the most important page on the site and 0.1 isn't)

Search Engine Change Frequency - Dropdown - always, hourly, daily, weekly, monthly, yearly, never
(How often the content of this page changes, for google site map, if left blank will inherit the setting for the section)

Hide From Xml Sitemap (hideFromXmlSitemap) - checkbox.

At this point your composition should look similar to this:

(Using pink for composition icons make them easier to spot in the list when you are curating your document types)

Add this composition to all of the document types on your site!

Now editors have the ability to set these values for each page of the site, but again, rather than expect them to set them on every single page, when we check the values of these properties, if they are blank, we'll use the values from the parent or parent's parent nodes, using 'recursion' up the Umbraco Content Tree, enabling the values to be set in one place for a particular section, eg. setting once on a News Section, would then apply to all News Articles.

3. Building the XmlSiteMap.cshtml template

We'll start by writing out in the template the xml schema for the sitemap and because we don't want our template to inherit any 'master' html layout we'll set the 'layout' to be null.

Notice how we're not adding any spaces or carriage returns before the <urlset> opening declaration, even though it would be easier to read if we did, we want to avoid making the XML invalid.

Getting a reference to the sitemap starting point

We're going to start at the site homepage, and since our XmlSiteMap page is created underneath this page, we can use the 'Site()' helper to find the starting point for the sitemap as IPublishedContent.

IPublishedContent siteHomePage = Model.Content.Site();

Rendering a site map entry

We will retrieve each page in the site as IPublishedContent, and read in the SearchEngineChangeFrequency(recursively), SearchEngineRelativePriority, Url, when it was last modified etc...

This is a great candidate for a Razor Helper - these helpers are a great way to organise your razor view implementation, to stop yourself repeating code and html in multiple places - here we'll have one place to write out the logic for our Url entry:

@helper RenderSiteMapUrlEntry(IPublishedContent node)
{
// we're passing 'true' as an additional parameter to HasValue and GetPropertyValue this means the value will be sought 'recursively' up the content tree, until a value is found.var changeFreq = node.HasValue("searchEngineChangeFrequency", true) ? node.GetPropertyValue<string>("searchEngineChangeFrequency", true) : "monthly";
// with the relative priority, this is a per page setting only, so we're not using recursion, so we won't pass 'true' here and we'll default to 0.5 if no value is setvar priority = node.HasValue("searchEngineRelativePriority") ? node.GetPropertyValue<string>("searchEngineRelativePriority") : "0.5";
<url>
<loc>@EnsureUrlStartsWithDomain(node.UrlWithDomain())</loc>
<lastmod>@(string.Format("{0:s}+00:00", node.UpdateDate))</lastmod>
<changefreq>@changeFreq</changefreq>
<priority>@priority</priority>
</url>
}

We're using IPublishedContent in this example but if you prefer to use ModelsBuilder you could take advantage of the fact that the Xml Sitemap Settings composition will create an interface called IXmlSiteMapSettings - allowing you to adjust the helper to accept this 'type' eg RenderSiteMapUrlEntry(IXmlSiteMapSettings node) and allowing you to read the properties without the GetPropertyValue helper, eg node.SearchEngineRelativePriority - however you would need to create an extension method on IXmlSiteMapSettings to implement the recursive functionality we make use of on the SearchEngineChangeFrequency property.

EnsureUrlStartsWithDomain - Razor Function

Razor Functions are similar to Razor Helpers, but instead of writing out html, they can return a value, again enabling you to write key functionality in a single place in your views - you can also share functions across multiple razor views! - We'll create a Razor Function to ensure the urls we display on our sitemap have the correct domain (you aren't meant to have relative urls on an Xml Sitemap - citation needed),

visit the url of your sitemap page (http://yoursite.com/sitemap) and this will render a single sitemap entry for the homepage, which ermmmm, isn't very comprehensive!

Looping through the rest of the site

So now we need to find the pages created beneath the homepage, and see if they should be added to the sitemap, and then in turn look at the pages beneath those etc, until the entire content tree is traversed.

We can use IPublishedContent's .Children method to return all the pages directly beneath a particular page eg:

IEnumerable<IPublishedContent> sitePages = siteHomePage.Children();

So we need to loop through each of these 'child' pages, and write out their sitemap markup using our helper, and then in turn loop through their children (grandchildren?) etc and so on... (great-great-grandchildren...)

So hopefully you can see the problem here, how deep do we go? How do we handle the repetition forever...

... well we can use recursion - we can create a further razor helper that 'calls itself' [insert inception reference here]...

Recursive Helper

If we create a helper called perhaps 'RenderSiteMapUrlEntriesForChildren' that then accepts a 'Parent Page' parameter as the starting point, then we can find the children of this Parent Page, write out their Site Map Entry, and then... call this same method again... from itself - recursion!

Checking if a page should be on the sitemap

This is all very well, but what if some super secret pages shouldn't be on the sitemap? and what about the document type blacklist we mentioned earlier? and what if we only want to go 3 levels deep?

HideFromSiteMap

We added a hideFromXmlSitemap checkbox to all of our document types via our XmlSiteMapSettings composition, let's update the helper to only return children that haven't got the checkbox set, excluding these pages (and any beneath them) from the sitemap.

Set your MaxSiteMap depth to be 2 on your XmlSiteMap content item, and save and republish, your sitemap will now only contain entries for the top two levels, leaving the value blank, will mean no Maximum Depth restriction will be applied.

DocumentType Blacklist

Our Xml Sitemap includes an entry for itself on the XML Sitemap, I thought we had excluded that document type, when we created the documentTypeBlacklist property...

Add sitemap to robots.txt

Once you introduce a Sitemap for the first time, you might suddenly find yourself being crawled by multiple different search engine bots, which is exactly what you want, however if your site or hosting is a little creaky, you might want to add a crawl rate to the robots.txt to instruct well behaved search engine bots to give a bit of breathing space to your site in between crawl requests:

Test your Xml SiteMap in a validation tool

Summary

This is just one way to add an XML Sitemap to your site, depending on your site it might not always be the 'best way' eg it will be much faster using XSLT! - particularly for large sites - however this tutorial aims to serve as an introduction to Razor, Helpers, Functions, IPublishedContent, and working with the Umbraco Content Tree, it is not trying to establish the 'only best practice' way to achieve an Xml SiteMap!

Our.umbraco.com is the community mothership for Umbraco, the open source asp.net cms. With a friendly forum for all your questions, a comprehensive documentation and a ton of packages from the community.
This site is running
Umbraco version 7.13.1