Last year I wrote a blog titled Using Classic ASP and URL Rewrite for Dynamic SEO Functionality, in which I described how you could combine Classic ASP and the URL Rewrite module for IIS to dynamically create Robots.txt and Sitemap.xml files for your website, thereby helping with your Search Engine Optimization (SEO) results. A few weeks ago I had a follow-up question which I thought was worth answering in a blog post.

Overview

Here is the question that I was asked:

"What if I don't want to include all dynamic pages in sitemap.xml but only a select few or some in certain directories because I don't want bots to crawl all of them. What can I do?"

That's a great question, and it wasn't tremendously difficult for me to update my original code samples to address this request. First of all, the majority of the code from my last blog will remain unchanged - here's the file by file breakdown for the changes that need made:

Filename

Changes

Robots.asp

None

Sitemap.asp

See the sample later in this blog

Web.config

None

So if you are already using the files from my original blog, no changes need to be made to your Robot.asp file or the URL Rewrite rules in your Web.config file because the question only concerns the files that are returned in the the output for Sitemap.xml.

Updating the Necessary Files

The good news it, I wrote most of the heavy duty code in my last blog - there were only a few changes that needed to made in order to accommodate the requested functionality. The main difference is that the original Sitemap.asp file used to have a section that recursively parsed the entire website and listed all of the files in the website, whereas this new version moves that section of code into a separate function to which you pass the unique folder name to parse recursively. This allows you to specify only those folders within your website that you want in the resultant sitemap output.

It should be easily seen that the code is largely unchanged from my previous blog.

In Closing...

One last thing to consider, I didn't make any changes to the Robots.asp file in this blog. But that being said, when you do not want specific paths crawled, you should add rules to your Robots.txt file to disallow those paths. For example, here is a simple Robots.txt file which allows your entire website:

# Robots.txt# For more information on this file see:# http://www.robotstxt.org/# Define the sitemap pathSitemap: http://localhost:53644/sitemap.xml# Make changes for all web spidersUser-agent: *Allow: /Disallow:

If you were going to deny crawling on certain paths, you would need to add the specific paths that you do not want crawled to your Robots.txt file like the following example:

# Robots.txt# For more information on this file see:# http://www.robotstxt.org/# Define the sitemap pathSitemap: http://localhost:53644/sitemap.xml# Make changes for all web spidersUser-agent: *Disallow: /fooDisallow: /bar

With that being said, if you are using my Robots.asp file from my last blog, you would need to update the section of code that defines the paths like my previous example:

I had another interesting situation present itself recently that I thought would make a good blog: how to use Classic ASP with the IIS URL Rewrite module to dynamically generate Robots.txt and Sitemap.xml files.

Overview

Here's the situation: I host a website for one of my family members, and like everyone else on the Internet, he wanted some better SEO rankings. We discussed a few things that he could do to improve his visibility with search engines, and one of the suggestions that I gave him was to keep his Robots.txt and Sitemap.xml files up-to-date. But there was an additional caveat - he uses two separate DNS names for the same website, and that presents a problem for absolute URLs in either of those files. Before anyone points out that it's usually not a good idea to host multiple DNS names on the same content, there are times when this is acceptable; for example, if you are trying to decide which of several DNS names is the best to use, you might want to bind each name to the same IP address and parse your logs to find out which address is getting the most traffic.

In any event, the syntax for both Robots.txt and Sitemap.xml files is pretty easy, so I wrote a couple of simple Classic ASP Robots.asp and Sitemap.asp pages that output the correct syntax and DNS-specific URLs for each domain name, and I wrote some simple URL Rewrite rules that rewrite inbound requests for Robots.txt and Sitemap.xml files to the ASP pages, while blocking direct access to the Classic ASP pages themselves.

All of that being said, there are a couple of quick things that I would like to mention before I get to the code:

First of all, I chose Classic ASP for the files because it allows the code to run without having to load any additional framework; I could have used ASP.NET or PHP just as easily, but either of those would require additional overhead that isn't really required.

Second, the specific website for which I wrote these specific examples consists of all static content that is updated a few times a month, so I wrote the example to parse the physical directory structure for the website's URLs and specified a weekly interval for search engines to revisit the website. All of these options can easily be changed; for example, I reused this code a little while later for a website where all of the content was created dynamically from a database, and I updated the code in the Sitemap.asp file to create the URLs from the dynamically-generated content. (That's really easy to do, but outside the scope of this blog.)

That being said, let's move on to the actual code.

Creating the Required Files

There are three files that you will need to create for this example:

A Robots.asp file to which URL Rewrite will send requests for Robots.txt

A Sitemap.asp file to which URL Rewrite will send requests for Sitemap.xml

A Web.config file that contains the URL Rewrite rules

Step 1 - Creating the Robots.asp File

You need to save the following code sample as Robots.asp in the root of your website; this page will be executed whenever someone requests the Robots.txt file for your website. This example is very simple: it checks for the requested hostname and uses that to dynamically create the absolute URL for the website's Sitemap.xml file.

Step 2 - Creating the Sitemap.asp File

The following example file is also pretty simple, and you would save this code as Sitemap.asp in the root of your website. There is a section in the code where it loops through the file system looking for files with the *.html file extension and only creates URLs for those files. If you want other files included in your results, or you want to change the code from static to dynamic content, this is where you would need to update the file accordingly.

Note: There are two helper methods in the preceding example that I should call out:

The GetFolderTree() function returns a string array of all the folders that are located under a root folder; you could remove that function if you were generating all of your URLs dynamically.

The WriteUrl() function outputs an entry for the sitemap file in either XML or TXT format, depending on the file type that is in use. It also allows you to specify the frequency that the specific URL should be indexed (always, hourly, daily, weekly, monthly, yearly, or never).

Step 3 - Creating the Web.config File

The last step is to add the URL Rewrite rules to the Web.config file in the root of your website. The following example is a complete Web.config file, but you could merge the rules into your existing Web.config file if you have already created one for your website. These rules are pretty simple, they rewrite all inbound requests for Robots.txt to Robots.asp, and they rewrite all requests for Sitemap.xml to Sitemap.asp?format=XML and requests for Sitemap.txt to Sitemap.asp?format=TXT; this allows requests for both the XML-based and text-based sitemaps to work, even though the Robots.txt file contains the path to the XML file. The last part of the URL Rewrite syntax returns HTTP 404 errors if anyone tries to send direct requests for either the Robots.asp or Sitemap.asp files; this isn't absolutely necesary, but I like to mask what I'm doing from prying eyes. (I'm kind of geeky that way.)

In the early days of the Internet, some computers had video capabilities that were limited to 256-color palettes. Since HTML's 24-bit RGB palette supports 16,777,216 colors, someone very smart figured out an algorithm that reduced the full 24-bit color palette into a much smaller 216-color palette that computers with limited color support could utilize.

Today most operating systems don't have a problem with full 24-bit or 32-bit color palettes, but I tend to stick to the 216-color palette in most circumstances just because it's pretty easy to do the math in my head. When you think about the hexadecimal 00-33-66-99-CC-FF progression, it's pretty easy to figure out which colors you need. That said, every once in a while I need to see the subtle differences between colors that are close to each other. With that in mind, it's pretty handy to keep a color palette around, and the following table lists the original 216-color safe web palette that all browsers should support.

FFFFFF

FFFFCC

FFFF99

FFFF66

FFFF33

FFFF00

FFCCFF

FFCCCC

FFCC99

FFCC66

FFCC33

FFCC00

FF99FF

FF99CC

FF9999

FF9966

FF9933

FF9900

FF66FF

FF66CC

FF6699

FF6666

FF6633

FF6600

FF33FF

FF33CC

FF3399

FF3366

FF3333

FF3300

FF00FF

FF00CC

FF0099

FF0066

FF0033

FF0000

CCFFFF

CCFFCC

CCFF99

CCFF66

CCFF33

CCFF00

CCCCFF

CCCCCC

CCCC99

CCCC66

CCCC33

CCCC00

CC99FF

CC99CC

CC9999

CC9966

CC9933

CC9900

CC66FF

CC66CC

CC6699

CC6666

CC6633

CC6600

CC33FF

CC33CC

CC3399

CC3366

CC3333

CC3300

CC00FF

CC00CC

CC0099

CC0066

CC0033

CC0000

99FFFF

99FFCC

99FF99

99FF66

99FF33

99FF00

99CCFF

99CCCC

99CC99

99CC66

99CC33

99CC00

9999FF

9999CC

999999

999966

999933

999900

9966FF

9966CC

996699

996666

996633

996600

9933FF

9933CC

993399

993366

993333

993300

9900FF

9900CC

990099

990066

990033

990000

66FFFF

66FFCC

66FF99

66FF66

66FF33

66FF00

66CCFF

66CCCC

66CC99

66CC66

66CC33

66CC00

6699FF

6699CC

669999

669966

669933

669900

6666FF

6666CC

666699

666666

666633

666600

6633FF

6633CC

663399

663366

663333

663300

6600FF

6600CC

660099

660066

660033

660000

33FFFF

33FFCC

33FF99

33FF66

33FF33

33FF00

33CCFF

33CCCC

33CC99

33CC66

33CC33

33CC00

3399FF

3399CC

339999

339966

339933

339900

3366FF

3366CC

336699

336666

336633

336600

3333FF

3333CC

333399

333366

333333

333300

3300FF

3300CC

330099

330066

330033

330000

00FFFF

00FFCC

00FF99

00FF66

00FF33

00FF00

00CCFF

00CCCC

00CC99

00CC66

00CC33

00CC00

0099FF

0099CC

009999

009966

009933

009900

0066FF

0066CC

006699

006666

006633

006600

0033FF

0033CC

003399

003366

003333

003300

0000FF

0000CC

000099

000066

000033

000000

Update #1

I should also mention that sometime back in 1998 I wrote a classic ASP page that automatically generates the HTML for the table that I listed, and here's the code for that:

Update #2

A lot of web programmers started out with classic ASP like I did, but like most of those programmers I eventually moved on to ASP.NET. And with that in mind, here's the C# code to create the table that I listed: