My question is, there are 3 crawl counters displayed, successful counter, fail counter and warning counter. For each counter value, will there be any duplication urls? For example, it is reported for web data source www.mysite.com, 1000 are crawled successfully, 10 failed, no warning. Does it mean there are 1000 distinct web pages stored in Search Center? I am not sure whether there are any duplicated Urls in the 1000 counted pages?

BTW: I have this confusion because I set daily incremental page crawl, for example, if http://www.mysite.com/1.html is crawlered both yesterday and today (both cases are successful crawl), will it be counted twice? Appreciate if anyone could provide some documents about what are the counters' meaning?

1 Answer
1

If you crawl a regular website it is going to follow each of the links. It shouldn't duplicate pages, but it will see the reference to the home page for example many times. Ultimately you would determine the number of pages or items by looking at the Items in Index count not the number of items crawled.

Thanks Mike! Do you have any recommendations for documents that prove there is no duplication?
–
George2Jul 11 '10 at 7:28

I need to generate a report about how many Urls are already crawled and if there could be a document which could prove there is no duplication, it will make me more confident. Thank you!
–
George2Jul 11 '10 at 7:30