Update: For those interested in watching a presentation of this content below you can download (right-click and select “Save target as..”) and watch this video here (200+ MB) that was recorded during a webcast on 2011-07-27. My presentation starts at 6min20sec.

What I will try to do in this post is convert most of that content into additional diagrams that should help you “see” how these changes related to fault tolerance and/or performance affect your search diagram.

SharePoint Search – Query Component (Fault Tolerance)

In this diagram you see how your architecture would look like after you add a new mirror Query Component for an existing Index Partition, which you do in order to provide fault tolerance for your lookup of matched items for full-text search queries against your index. The reasons for doing that are pretty simple (and detailed in here): one server goes down, the other can still keep serving queries, and unless you configure the mirror server as “failover only” it will also distribute the load of incoming queries.

SharePoint Search – Query Component (Performance)

In this diagram there is just a very subtle change from the previous one (marked in red), but it makes a lot of difference in your architecture: the additional Query Component has a different Index Partition. What this means is that now your content is divided between the two Index Partitions, so if for example you have a total of 6 million indexed items, then each Index Partition has 3 million items. This also means that your Query Processor will send requests in parallel to both Query Components and, since each one of them has to search against only half of the index (3 million out of 6 million total), they will be able to do this faster.

SharePoint Search – Property db (Performance)

Here things start to get interesting, with not only a new Query Component/Index Partition, but also with a new Property db (added items marked in red). If you read this post (mentioned a dozen times by now ) you understand that in order to provide search results, the Query Processor need to perform a lookup not only in the Index Partition but also in the Property db in order to retrieve the metadata associated with the results found. When you start to increase your indexed content, for example by having 20M items that you then split across 2 Index Partitions to improve your index lookup time, it may happen that your Property db is now your bottleneck. A way to minimize this impact in the growing number of indexed items is by adding a new Property db and assigning a new Query Component/Index Partition to it. This way, each combination of Index Partition/Property db has to store and handle search requests for only half of the total number of indexed items.

It is also important to notice that all search-related databases (Property db, Search Admin db and Crawl db) can be configured for fault tolerance through the use of database mirroring.

SharePoint Search – Query Processor (Fault Tolerance and Performance)

Even after you have scaled your Query Components, your Index Partitions, your Property dbs, another query component that may require your attention is the Query Processor. This is the component that does the hard work of accessing the Query Component (to check items that match the query), the Property db (to get metadata associated with those items) and the Search Admin db (to get security descriptors in order to apply security trimming in the results). By adding a new Query Processor (marked in red and described in here), you divide the load of this task across multiple servers, increasing your query performance and providing fault tolerance (if one goes down, the other can still handle queries).

SharePoint Search – Crawl Component (Fault Tolerance and Performance)

Now let’s take a look at the other side of search: Crawling/Processing/Indexing. You can notice a new Crawl Component that was added in the diagram above, now what does this mean? This means that both Crawl Components will split the load of crawling the content sources defined, and both will keep pulling from and updating the crawling queue stored in the Crawl db. For example, if your full crawl with one Crawl Component and one Crawl db was taking 4 days, by adding another Crawl Component (and considering you have sufficient CPU/Memory/IO/bandwidth/etc. resources) the same full crawl should be reduced to around 2 days. Also, with two Crawl Components working from the same Crawl db, you also get fault tolerance in case one of them goes down.

SharePoint Search – Crawl Component and Crawl db (Performance)

What happens when you start to add many Crawl Components to the same Crawl db? Well, the db can easily become your bottleneck. One way to keep scaling out and increasing your crawling performance is through the use of an additional set of Crawl Component/Crawl db, as shown in the diagram above. In this way, distinct content sources (web applications, web sites, file shares, etc.) will be split among these two Crawl dbs, and their respective Crawl Components will have to handle (crawl/process/index) only part of the content, making it easier to deal with.

There are a lot of things that go into this, from how content to be crawled is split among multiple Crawl dbs to how you can manually define this mapping yourself (if you want to). All of this and more are detailed in this post here.

Since we are starting with content processing You may be asking “what about the crawling part of FAST Search?”. Well, the good news is that if you are using the FAST Content SSA to crawl your content, then your crawling architecture looks pretty much like what we just saw for SharePoint Search above. The main difference is that the FAST Content SSA will be tasked only with crawling, since processing and indexing will be done in the FAST Search farm. And talking about content processing, the first component that can be scaled out is the Content Distributor (as shown above in red). What this gives you is just fault tolerance, since the FAST Content SSA will connect and send batches to only one Content Distributor at a time, and will switch to the other one just in case of failure to submit batches to the “primary” Content Distributor (you also must make sure to configure the FAST Content SSA listing both Content Distributors).

In regards to Document Processors, you will definitely have more than one (you get 4 of them by default in a simple installation), which gives both fault tolerance (in case one of them goes down) and performance (since they will work in parallel). Also, if the “primary” Content Distributor goes down, the Document Processors will be smart enough to switch to the other available Content Distributor.

Indexer and Search (Fault Tolerance)

Above is the diagram of a somewhat common deployment of FAST Search for SharePoint, where you have two servers and each one is configured with a combination of Indexer and Search in a way that one server is the primary Indexer and backup Search, and the other server is backup Indexer and primary Search. In this way, with just your two servers you are providing fault tolerance for both Indexer and Search.

Query Processing (Fault Tolerance)

In this diagram above a Query Processing server (with QRServer, QRProxy and FSA Worker components) was added to the FAST Search farm and also properly configured in the FAST Query SSA by listing both servers in its setup. With this configuration, queries will be sent to both servers in a round robin fashion, and if one of the servers fails the FAST Query SSA will keep sending queries just to the active server.

Conclusion

There is a lot you can configure in both SharePoint Search and FAST Search for SharePoint to increase performance and/or provide fault tolerance for components of your search farm. The important thing is to understand what options are available for each platform and keep them in mind when you first design your search architecture as well as after your search project is in production, in case you need to scale out your deployment.

I haven’t explored much in regards to fault tolerance for the SP DBs, since they support both SQL Server mirroring and clustering. I would expect the FAST Admin DB to follow the same rule. That would be my guess, as I haven’t tested it myself.

Hi Leo!!! Configured my first Deployment.xml file with a three server FAST farm. With your vaious examples here, they make alot of sense, but could you provide sample deployment.xml files? It is confusing with primary/secondary rows/colums. Thanks!!!

It’s been a while since I last check this, but if I recall correctly, the search attribute should be set to “true”. Then the queries will be automatically balanced between the two search servers. I don’t believe you can have a “passive” search node that only starts to get queries when the other one goes down. At least that’s what I remember.