Troubleshooting the SSP Search Indexer for MOSS 2007

It's time to set aside my SharePoint 2013 farm and get into the way-back machine to troubleshoot the Indexer for MOSS SharePoint 2007 Shared Services Provider (SSP)...

(As I mentioned in a previous post on Search Terminology, I try avoiding the term "Indexer" because it has a muddied connotation in SharePoint 2010 & 2013. However, given this relates to SP2007, consider this post one of those exceptions)

Isolate failures that occur during a crawl (e.g. problems with the content being crawled)

First, there are several generic scenarios that vary greatly in potential root causes, but are worth noting some basic strategies:

If items do not appear in search results, double check the Crawl Log to verify that the item(s) actually got crawled

If not, address these failures individually

If so, often this relates to an issue with contextual scopes such as This Site and This List

For an easy way to verify, try searching from an Enterprise Search Center site without any scopes. If the item is found here, then the problem relates to contextual scopes

Explaining the cause and ways to overcome of this particular issue is big enough to warrant it's own post, but the simplistic answer: Ensure the [Search] Content Source specifies the URL for the Web Application's Default zone

If you encounter errors crawling individual items (e.g. authentication failures, timeouts, and so on), chances are high that the problem relates to the source being crawled - not search per se

In the Search Administration page, set the Proxy to 127:0.0.1 with port 8888

Start the crawl and watch the traffic from the crawler run through Fiddler

For the cases when the Indexer is thoroughly misbehaving (e.g. hung crawls, Services Instances not provisioning properly, unable to load the Search Settings and/or Content Sources, etc), the following troubleshooting steps may help...

Check the Windows SharePoint Services Administration service

On the Indexer server, make sure the Windows SharePoint Services Administration service is running with an account that has local Admin permissions (by default, this service runs as the Local System account). That service is what the SharePoint Timer service invokes when provisioning services, sites, etc. Thus, if the Administration service isn’t running, then the search service instance (e.g. "the bits") and/or the search related web services may not get properly provisioned. Once running, manually run any pending administrative tasks using: stsadm -o execadmsvcjobs

Verify connectivity to the Search Admin Web Services from the SSP

Various errors will occur if the SSP cannot access all of the Search Admin web services. You may also see errors with queries if the Web Front Ends cannot access these web services. To determine if there is a problem accessing the Search Admin web service (running on the Indexer server) from the server hosting the SSP Admin site, try the following:

Commonly, the SSP site runs on the Central Admin server, but to verify its location, hover over the link for the SSP in Central Admin

From IIS on the Indexer, go to Sites -> Office Server Web Services

This site will have a SearchAdmin.asmx web service

The SSP will also have a site here (the site is named the same as the SSP), which has a ‘Search’ folder containing another SearchAdmin.asmx web service

This site has an HTTP binding on port 56737 and another for HTTPS on 56738

Therefore, there are a total of four applicable links (e.g two web services with two bindings each) such as the following:

From the server hosting the SSP Admin site (e.g. again, typically the Central Admin server), attempt to browse to all four of these links using the Indexer server name in the URL to check connectivity between the servers. In other words:

Connection failures here commonly occur because of proxy, firewall, among other network related issues... for these, Fiddler is once again your friend

Recycle Services

The crawl portion of search is a state machine and components sometimes get out of sync. Because the state of the crawl is stored in the Search databases, the search processes can be recycled. When restarted, these will simply regain state from the DB. Also, as noted above, there are cases where provisioning components may get hung up, so recycling the Timer and Administration service could also help:

From services.msc on each of the Query Servers, recycle the following services:

Office Server Search

Windows SharePoint Timer

From services.msc on the Indexer, then recycle:

Office Server Search

Windows SharePoint Services Administration

Windows SharePoint Services Timer

Check for Hung Crawls and Propagation Issues

If a crawl appears hung, check the CrawlID, Status, and SubStatus of the crawl using the MSSCrawlHistory table in the Search database using a query such as the following (replace the date/time to specify the time before the problematic crawl started). This query effectively gets all crawls that started after this time that have not completed successfully:

From this, use the CrawlID to check the item counts with the two queries such as the following (replace the example CrawlID 12345 below). If you run these over several intervals a few minutes apart each run, the crawl is likely hung:

SELECTcount(*) FROM MSSCrawlQueue WITH (nolock) WHERE CrawlId = 12345

SELECTcount(*) FROM MSSCrawlUrl WITH (nolock) WHERE CrawlId = 12345

If neither count changes over a period of time, verify the problem does not relate to propagation.

As the Indexer gathers and processes content into the full text index, the full text index changes get propagated from the Indexer to any server with a Query role (unless the Indexer server has both the Indexing and the Query role). If anything interferes with this propagation to one or more Query servers, the crawl can appear hung - in reality, the Indexer is simply waiting for the Query server to respond with "yeah, I got that last bit you sent me"

Check the Search Admin page for any sort of propagation error

Try removing the Query role from the server reporting the propagation error

This Query role can be reapplied at a later time

Another troubleshooting tactic is configuring the Indexer server to have both the Indexing and Query roles to make "Propagation not required". In this scenario - even if there are multiple query roles - the index does not need to propagate to any server and it remains entirely on the Indexer (and queries will still be returned for the end users)

If this branch present, verify you also see the following sub-branches

.\CatalogNames

.\Gather

.\Gatherer Matrix Plugin

.\Gathering Manager

.\ResourceManager

If you find any items that seem off, try recycling the services (as above) and then review the Index/registry paths again to verify if they are present.

Invasive Option 1: Build/migrate to a new SSP

At this point, it may make sense to create a new SSP and begin crawling content while continuing to work on the problematic SSP. With this, you could be that much further ahead if the efforts troubleshooting garner no result and you find yourself needing to create a new SSP.

While the new SSP crawls, the broken SSP *may still be used to service queries (albeit state results) in the meantime

This of course assumes that the query component(s) are still responding to the WFE(s) regarding queries ...your mileage here may vary

Once the crawl completes, you can then move all the Web Applications to this new SSP

Invasive Option 2: Reset the Index

Invasive Option 3: Re-provision Search and/or migrate the Indexer role to another server

If creating a new SSP is not a good option (e.g. you have too many managed properties, scopes, etc), then you can attempt to re-provision search services using the following:

From Central Admin -> Operations -> Services on Server:

For the servers with only a Query role (e.g. WFE1 and WFE2):

Select the WFE1 server

Stop and then start the Office Server Search service

Repeat the steps on server WFE2

For the Indexer server, you can attempt the same (e.g. stop/start the Search service), but it may not allow you to given that this server is the Indexer for the SSP

Note: Stopping the Search Service on the Indexer will remove the index

If the search service cannot be stopped on the Indexer, another tactic is to [temporarily] start the Indexer role on another server and then point the SSP to this new Indexer server

Once the SSP is using the newIndexer server, attempt a more thorough re-provisioning of search on the original Indexer server with the following:

Remote into the original Indexer server

Stop the Search Service using the command: stsadm -o osearch -action stop

Make sure the Office SharePoint Server Search service is stopped in both the Service Console (services.msc) on the Indxer and in the Central Admin GUI

If the Search service can be successfully started, you can then update the SSP to use the Indexer server once again

Check the RegistryBlob for Corruption

If the problem persists or things seem to be incompletely provisioned, then the SSP may be corrupted. In several occasions, I found this caused by the RegistryBlob stored in the SSP database getting corrupted. The RegistryBlob is propagated locally to the Indexer - look for a file name RegistryBlob.reg in the path for the Index (check the SSP configuration settings to confirm this path for the Indexer). If this file doesn't exist or contains corrupted characters, then chances are good that the value in the database is also corrupted. If you find a corrupted RegistryBlob, it's worth opening a support case, but - given the nature of corruption - there is no guarantee that this can be resolved.

If the problem still persists after all of this, unfortunately, creating a new SSP is then going to be the only real option…