Generating Sitemaps with Spring Boot & SitemapGen4j

22 May 2016
·
5 minute read

A sitemap is a useful SEO tool that helps to inform search engines & web crawlers that certain pages exist on your website. The most common form of a sitemap is an XML file that sits on your webserver that may be submitted to search engines through their site administration portal, such as the Google Search Console. A more in-depth explanation of a sitemap can be found here.

Blog entries themselves will typically be stored in either a database or on a file system, with the mechanism for accessing & retrieving these entries being referred to as a repository. Spring provides a @Repository annotation that stereotypes the specified component as being a mechanism for encapsulating storage, retrieval, and search behavior which emulates a collection of objects.

Our repository implementation will therefore be annotated with @Repository and should implement the necessary business logic for querying and storing our blog entries. As the sitemap will contain a list of URLs to our blog entries, the repository must facilitate querying all of the entries and returning them as a list.

For this demonstration we will be simply returning a pre-populated list of blog entries, however the findAll() method is where you would typically perform an actual look-up, e.g. a database query or file system traversal.

SitemapGen4j is a Java library for generating XML sitemaps and provides functionality for populating a sitemap with modification dates, change frequencies, and individual page importance. We will be using it in this demonstration to simply populate a list of links in the sitemap, however you may extend further and add functionality for providing last modified dates for your blog entries, or ranking entries by importance.

The implementation of our sitemap generation fits the @Service component stereotype and should simply pass the data from our repository to SitemapGen4j, therefore having no encapsulated state. As the sitemap generation service requires access to the repository of blog entries we can utilise Spring’s dependency injection. With the help of JSR330 we can annotate the service’s constructor with the @Inject annotation and have its dependencies resolved by Spring’s dependency container.

Now that we have a service that can provide us with an XML sitemap of our application’s model we must provide a way to render it to those who request it, whether it be a user accessing it via their browser or a headless web crawler accessing the resource directly. In the MVC pattern this component is referred to as the view, and is represented in the Spring framework with the View interface.

Simply put, our view implementation needs to invoke the service to generate a sitemap and write the result of its generation (the XML file) to the HTTP response. As the view is dependent on access to the generation service we can utilise Spring’s dependency injection again in a similar fashion:

With the components to generate and render our repository of blog entries in place, we must now allow the world to access the resource that we have produced. We can provide a mapping to the /sitemap.xml endpoint within a @Controller that is composed solely of our SitemapView:

As your site grows in the amount of pages it indexes it may be a good idea to start caching this result as the operation to construct the sitemap may become more resource intensive (as currently we are constructing it every time it is requested). A guide to caching with Spring can be found here.

You may now inform search engines of the existence of your sitemap to assist them in indexing your website. A simple way to do this without submitting your site via control panels is to add the following to your robots.txt: