Sitemap Generator and Webmaster Crawl Filters

The website crawler in A1 Sitemap Generator
has many tools and options to ensure it can scan complex websites. Some of these include
complete support for robots text file, noindex and nofollow in meta tags, and nofollow in link tags.

Tip: Downloading robots.txt
will often make webservers and analytics software identify you as robot website crawler.

Note: You can also check the detected "state" flags that are related to webmaster filters.
Just select the desired URL in Analyze website and view all information in Page data.

HTML Code for Canonical, NoIndex and NoFollow

Canonical:
<link rel="canonical" href="http://www.example.com/list.php?sort=az" />
Useful in cases where two different URLs give same content.
Consider reading about
duplicate URLs as there may be better solutions than using canonical instructions, e.g. redirects.

NoFollow:

<a href="http://www.example.com/" rel="nofollow">bad link</a>

<meta name="robots" content="nofollow" />

NoIndex:
<meta name="robots" content="noindex" />

Include and Exclude List and Analysis Filters

You can read more in our A1 Sitemap Generator online help system to learn about
analysis
and
output
filters.

Match Behavior and Wildcards Support in Robots.txt

The match behavior in the website crawler used by A1 Sitemap Generator is similar to that of most search engines.

Support for wildcard symbols in robots.txt file:

Standard: Match from beginning to length of filter.gre will match: greyfox, greenfox and green/fox.

Wildcard *: Match any character until another match becomes possible.gr*fox will match: greyfox, grayfox, growl-fox and green/fox.Tip: Wildcards filters in robots.txt are often incorrectly configured and a source of crawling problems.

The crawler in our sitemap generator tool will obey the following user agent IDs in the robots.txt file:

As one of the lead developers in Microsys, his hands have touched almost
all the code in the software available at this website. If you email
any questions, chances are he will be the one answering them.