crawlerconsistency.exe reference

FAST Search Server 2010

Applies to: FAST Search Server 2010

Topic Last Modified: 2011-01-14

Use the crawlerconsistency tool to verify and repair the consistency of the crawler item and metadata structures on disk. You can also use the tool to verify and maintain internal crawler store consistency, or when recovering a damaged crawl store.

By default, the tool detects and attempts to repair the following inconsistencies:

Items referenced in metadatabases, but not found in the item store.

Invalid items in the item store.

Unreferenced items in the item store (requires the docrebuild mode.)

Duplicate database checksums not found in metadatabases.

Multiple checksums assigned to the same URI in the duplicate database.

These inconsistencies are automatically corrected in the doccheck or docrebuild mode, followed by the metacheck mode. Any non-consistent URIs are logged, and a delete operation is issued to the indexer (which can be disabled) to ensure synchronization.

In a multi-node crawler environment, you can also use the tool to rebuild a duplicate server from the contents of per-node scheduler post-process checksum databases by using the ppduprebuild mode. Since this mode builds the duplicate server from scratch, you can also use it to change the number of duplicate servers that are used, by first changing the configuration and then rebuilding.

Note

To use a command-line tool, verify that you meet the following minimum requirements: You are a member of the FASTSearchAdministrators local group on the computer where FAST Search Server 2010 for SharePoint is installed.

The tool generates the following log files. Log files are only created when the first URI is written to a file.

Log file name

Description

<mode>_ok.txt

Lists URIs found that were not removed as inconsistencies.

The output from the metacheck mode lists every URI with a unique checksum, useful for comparing against the index.

Note

Items may have been dropped by the pipeline, leaving URIs in this file that are not in the index. You can safely remove URIs in the index that are not in this file.

<mode>_deleted.txt

Lists URIs deleted by the tool. Unless you disabled indexer deletes with the -n option, the URIs were removed from the index. Since these URIs were deleted as crawler inconsistencies, they may still exist on the Web servers and should be indexed. Recrawl this list of URIs with the crawleradmin tool by using the --addurifile option (also use the --force option to speed up crawling).

<mode>_deleted_reasons.txt

This log file is identical to the <mode>_deleted.txt file but also includes an "error code" to identify each URI delete reason. Definitions for each error code include the following:

101 - Item not found in item store

102 - Item found, but unreadable, in item store

103 - Item found, but length does not match meta information

201 - Metadata for item not found

202 - Metadata found, but unreadable

203 - Metadata found, but does not match checksum in duplicate database

204 - Metadata found, but has no checksum

206 - URI's host name not found in routing database

<mode>_wrongnode.txt

Used for multi-node crawls, this file outputs all URIs removed from a node because of incorrect routing. These URIs should be crawled by a different master node. The URIs are logged, but not deleted from the index.

<mode>_refeed.txt

Lists URIs that had their URI equivalence class updated by running the tool. To bring the index in sync, use postprocess refeed with the -i option to refeed the contents of this file. Or, perform a full refeed.