What is the best tool to create a Google Sitemap for my Web site?

Depends. Before you look for a tool, work out a sitemap strategy suitable for your Web site. Choosing a tool sets the procedure to create and maintain the sitemaps in stone, thus make a good decision in the first place.

Google Sitemaps is a URL submission tool, currently feeding only one search engine. Despite the label, it has not so much to do with a classic site map at first sight. Google's sitemap protocol lacks attributes like title, abstract, topic/category or parent node (hierarchy level and position of a node in the tree structure), which are necessary to create a site map which helps human users to navigate a site. A flat Google Sitemap file becomes a structured site map in Google's search index, which is build and maintained by regular crawling on the Web, and completed by sitemap based crawls as an additional method of inclusion and 'novation'.

A well thought out sitemap strategy takes care of navigation enhancements, and other methods and targets of URL submissions too. From the data gathered, reviewed and completed to create a Google Sitemap, one can easily make other useful URL collections, e.g. a Yahoo URL submission list, a hierarchical HTML site map, a RSS site feed and so on. Unfortunately, most Google Sitemap related tools don't come with clever add-ons reusing the data. Another important criteria is the desired (automated) handling of page updates. Most Google Sitemaps related tools aren't suitable for Web sites providing their visitors with frequent updates.

In the following I've listed a few types of Web sites along with appropriate procedures to make use of Google Sitemaps. Please note that most of the tools linked below aren't evaluated or even tested, they are just examples of a particular approach. "Static" refers to the method of page creation, it means "stored on the Web server, not dynamically created by a script".

Regardless of the method used to create a Google Sitemap, it must be double checked before the submission. If for some reason duplicated pages, unprotected printer friendly versions, or even questionable stuff like outdated link swap pages find their way into the sitemap, the Google Sitemaps submission will function as an index wiper and tank the site in Google's search results. Also, after the submission you should monitor the downloads and --at least the very first-- crawler problem reports.

One pager and tiny, static Web sites

If such a site needs a Google Sitemap at all, the tool of your choice is a text editor. Just grab an XML syntax example, edit the URLs in <loc>, except of <lastmod> delete all other tags within <url>, save as UTF-8 text, upload and submit the file, done. However, ensure all pages are linked from the root index page.

Small static Web sites, never updated

If the only page ever updated is the guestbook, you need a simple one time sitemap generation, that is an online tool like SitemapsPal. Enter your root URL in the form, disable all other sitemap attributes, and press submit. Copy the results and paste them into your favorite text editor, save the file as UTF-8 text, remove the usual duplicate entry http://www.yourdomain.com/index.html but leave http://www.yourdomain.com/ intact, upload and submit the file, done. On the Google Sitemap Tools Page you'll find a style sheet to display the XML sitemap in a Web browser, and a Google Sitemap editor you can use for minor updates.

Small static Web sites, frequently updated

You can use any tool for editing or regenerating the XML sitemap on content changes, but there is a smart way to do it. Download Simple Sitemaps from this site, if you're able to do minor edits in well documented scripts. Simple Sitemaps uses a plain text file containing a list of URLs to create XML, HTML, TXT, and RSS sitemaps. The tool itself is free, for a low fee you you can purchase the initial URL list spidered from your site. On page updates just change the date of last modification, or add a new line for a new page, upload the text file, and Googlebot updates all four types of sitemaps.

Medium sized and large static Web sites

For this type of Web site there are basically three different approaches:

1. You can add a few lines of code to every page and use an aggregator like AutoSiteMap's proxy script to maintain the Google Sitemap dynamically. The principle is simple: when a user visits a page on your site, its crawling priority gets updated, or, if the page is new, it's added to the sitemap. You rely on a third party service, so if the foreign server is down, incredible slow or unreachable, your sitemap is broke and page loads become slower. If you've multiple URLs pointing to the same page, e.g. affiliate links, you're toast.

2. Desktop tools like GSiteCrawler crawl your site, parse your server log files and even grab search results on the fly to compile a list of URLs. Those tools come with filters to suppress session-IDs and other ugly query string elements, URLs excluded by robots.txt, particular pages or directories and so on, so there is at least a minimum control of a sitemap's content. Usually the sitemap generation process is pretty much automated, and there are options to recreate sitemaps by project, thus desktop tools are a good choice for Webmasters maintaining multiple static sites.

3. Server sided sitemap generators like Google's Python script or phpSitemapNG enhance the potentialities of desktop tools by scanning the Web server's file system for URLs. Especially with large sites those tools are preferable, because they don't burn that much bandwidth. Like desktop tools, server sided solutions provide filters, crawling, stored page attributes, and even fully automated recurring sitemap generation and (re)submission per cron job.

Caution: URL lists harvested from the file system may contain stuff you do not want to submit to a search engine. For example scripts producing garbage without input parameters, forgotten backups of old versions or experimental stuff, and sneaky SEO spam like doorway pages from the last century.

Blogs

Web log software and similar content management systems like Drupal usually come with a build-in sitemap generator. If not, there are good plug-ins for WordPress, Movable Type, and others available. Some of them didn't make it on the sitemap related links lists, so simply search Google for [google sitemap generator "your blog software or CMS here"]. A blog's XML sitemap should be updated in the background, for example when the blog software pings Pingomatic, Weblogs, etc., announcing new posts to the blogosphere. Even with trusted software it's a good idea to check the XML sitemap for duplicates, that is different URLs pointing to the same post, or archive index page. URLs with IDs in the query string are suspect, mostly they have an equivalent address (permanent link). If there is no filter to suppress useless URLs, dump the software.

Forums

Some forum software vendors offer build-in functionality to create Google Sitemaps, for others like vBulletin you need plug-ins. The crux with forums is, that they make use of multiple URLs pointing to the same or similar content. The XML sitemap should contain only URLs to sub-forums and threads. Perhaps even pages per thread, but here it starts to become tricky, because the number of displayed posts per page is --usually-- user dependent, and search engine crawlers like Googlebot don't behave like real users, they don't accept cookies, they don't log in, and they start a new session per 'page view'.

In general, keep any URL with a post-ID out of the XML sitemap, and ensure those pages have a dynamically inserted robots META tag with a NOINDEX value. To avoid disadvantages caused by search engines filtering out duplicated content, see every post as a text snippet, and make absolutely sure that there is no more than ONE indexable URL pointing to a page containing each snippet. To enhance the crawlability of your forum, and to ensure search engines cannot index duplicated content, make creative use of search engine friendly cloaking.

Dynamic sites of any size

With a dynamic Web site do not use a 3rd party tool to create and update your Google Sitemap. Provide a dynamic database driven XML sitemap instead. If you don't use a content management system with build-in creation of XML sitemap files, write the script yourself, or hire a programmer (see dynamic Google Sitemaps design patterns). It's worth the efforts, because with 3rd party sitemap tools you put your search engine placements at risk, and most 3rd party tools handle dynamic content very costly. There are countless good reasons not to use external sitemap generators --not even Google's own script!-- with dynamic sites. I'd have to write a book to list them all, so just trust me and do not use external tools to create a Google Sitemap for your dynamic Web site!

Free hosted stuff

Well, my usual advice is go get a domain and a host, but for mom and pop sites and hobby bloggers as well reputable free hosting makes sense, and their content often deserves nice search engine placements. Unfortunately, in some situations free hosting and Google Sitemaps are not compatible, that is you'll have to live without a Google Sitemap.

Free hosted content management systems like Blogger don't allow the upload of XML sitemaps, but they may create a feed. Google's sitemap program accepts most feeds, so just submit your RSS or ATOM feed. This will not cover updates of older posts, but if the main page is popular and provides links to the archives, Googlebot will crawl all posts frequently.

If the free host adds nasty ads to every HTML page, try to create a plain text file with a list of your URLs, each URL in a new line. Google accepts every file extension for text sitemaps, so try different extensions until you find one which your free host doesn't touch. I didn't try whether Google accepts the misuse of common extensions or not, but perhaps using .gif or .jpeg will do the trick.