This article provides recommendations related to sizing and performance optimization of the managed metadata service in Microsoft SharePoint Server 2010. This article also provides best practices on how to configure the service and structure the service application databases for maximum performance.

The information in this article can help you understand the tested performance and capacity limits of the managed metadata service. Use this information to determine whether your planned deployment falls within acceptable performance and capacity limits.

The tests were first run against the baseline dataset, which simulates a typical customer dataset. Then, a single variable was changed and the same tests were run again to determine the effect that changing that variable had on performance. In most cases, variables were tested independently. However, in some cases, certain important variables were tested in combination.

The specific capacity and performance test results presented in this article might differ from the test results in real-world environments, and are intended to provide a starting point for the design of an appropriately scaled environment. After you have completed your initial system design, test the configuration to determine whether the system will support how you have configured the managed metadata service in your environment.

These tests were performed on the baseline term store first by using a term label length of 5 characters, and again by using a term label length of 250 characters. In this test, write operations represent a much greater percentage of the total than the mix of read and write operations that we used for most other tests.

Test

Percentage of mix

Create a term

5%

Get suggestions

70%

Get matches

10%

Get child terms in a term set by using paging

5%

Validate a term

10%

The requests-per-second (RPS) results for different term label lengths are shown in the following graph. This data suggests that term label length has an insignificant effect on average RPS for both loads.

CPU and memory usage are shown in the following graphs.

As shown by the results, the effect of term label length on CPU and memory usage for the Web server and application server is insignificant. However, the load on the database server increases as the term label length increases.

These tests were performed on the baseline term store, and then the term store was scaled up to 1 million terms by increasing the number of managed terms and keywords proportionally.

When the keywords term set was removed from the term store for testing, the difference was not significant among the performance of a term store with 100,000 terms, a term store with 500,000 terms, and a term store with 1 million terms, as shown in the following two graphs.

When the system is under the specified test load, the time that is required to create a keyword increases significantly as the number of keywords increases from 16,000 to 800,000. This trend can be seen in the next graph.

The number of terms in a term store does not significantly affect system performance when very few users create keywords or when the number of keywords is small.

The keywords term set is stored in a flat list, unlike other term sets that have a more complex structure. The larger the flat list grows, the longer it takes to check whether there is already a keyword that has the same name. Therefore, it takes longer to create a keyword in a large keywords term set.

The term store administrator should limit the size of the keywords term set to prevent latency when users create keyword terms. One approach is to frequently move keywords into a regular term set, which can improve performance and contribute to better organization of term data.

Any term set that contains more than 150,000 terms in a flat list is subject to latency and performance issues. One alternative is to use a managed term set which usually has a structured collection of terms. For more information about term sets, see Managed metadata overview (SharePoint Server 2010).

As the total number of terms in the term store approaches 500,000, users might experience various exceptions when they attempt to access the term store. By checking the related Unified Logging Service (ULS) log, the farm administrator can find the exception and determine whether it applies to client or server.

When TimeoutException errors occur, you can modify the time-out value in the client.config file or in the web.config file for the managed metadata service. The client.config file can be found in the %PROGRAMFILES%\Microsoft Office Servers\14.0\WebClients\Metadata folder. The web.config file can be found in the %PROGRAMFILES%\Microsoft Office Servers\14.0\WebServices\Metadata folder. There are four time-out values:

receiveTimeout A time-out value that specifies the interval of time provided for a receive operation to be completed.

sendTimeout A time-out value that specifies the interval of time provided for a send operation to be completed.

openTimeout A time-out value that specifies the interval of time provided for an open operation to be completed.

closeTimeout A time-out value that specifies the interval of time provided for a close operation to be completed.

These time-out values are defined in the customBinding section. You can increase the time-out value based on the specific operation that is timing out. For example, if the time-out occurs when messages are received, you only need to increase the value of ReceiveTimeout.

Note

There are time-out values for HTTP and HTTPS, and therefore you must modify the time-out value for either HTTP or HTTPS.

For more information about time-out values, see <customBinding> (http://go.microsoft.com/fwlink/p/?LinkId=214213).

When ThreadAbortException errors occur, you can increase the execution time-out value in the web.config file for the specific Web application. The web.config file is located in the %inetpub%\wwwroot\wss\VirtualDirectories\<Application Port Number> folder. For example, if the request is for TaxonomyInternalService on a Web application, first identify the web.config file for the Web application, and then add the following code into the configuration node.

This test was performed on a baseline term store that had 100,000 terms. During the test, the number of labels was incremented for each term, as shown in the following graph.

The average RPS decreases only slightly as the number of labels increases. CPU and memory usage on the Web server, application server, and database server increase only slightly, as shown in the following graphs.

This section reviews the test results for three different characteristics of the term store data: term label length, the number of terms per term store, and the number of term labels per term store. Trends revealed by this test include the following:

Increasing the term label length to 250 does not have a significant effect on term store performance.

Increasing the average number of labels per term to four does not have a significant effect on term store performance.

Increasing the number of terms to 1 million does not have a significant effect on term store performance.

When the term store contains more than 150,000 terms in a term set that uses a flat list, it can take a long time to add new terms to the term store.

These tests were performed by using the baseline read/write operation test mix with the "Create taxonomy item" test as the item that varied. The following table shows the specific operations that were used in the baseline test mix and their associated percentages.

Test

Percentage of load

Get suggestions

73%

Create taxonomy item

0%

Get matches

11%

Get paged child terms in a term set

5%

Validate a term

11%

For each successive test, the number of terms created was increased. The following table shows the three tests that were performed by looking at the average terms created per minute and then getting the average RPS.

Average terms created/minute

Average RPS

0

182

8.4

157

20

139

As shown in the following graph, RPS decreases as the average number of terms created per minute increases.

It is expected that term store performance will decrease as the percentage of write operations increases; because write operations hold more exclusive locks on the data, which delays the execution of read operations. Based on the test data, RPS does not significantly decrease until the average number of terms created reaches 20 per minute. However, an average term creation rate of 20 per minute is fairly high and does not ordinarily occur, especially in a mature term set. Making a term set read-only can improve performance by eliminating write operations.

The term store cache exists on all Web servers in a farm. It can contain term set groups, term sets, and terms. These tests were performed to show how the memory footprint of the cache object changes as the number of terms increases. There are other factors that affect the cache size — for example, term descriptions, the number of labels, and custom properties. To simplify the test, every term in the baseline term store has no description or custom properties, and has only one label with 250 characters.

The following graph shows how the memory footprint changes as the number of terms in the cache increases.

Memory usage on the Web server increases linearly as the number of the terms in the cache increases. This makes it possible to estimate the cache size if the number of terms is known. Based on the test data, memory usage should not be a performance issue for most systems.

This test shows the difference in performance between one and two managed metadata service applications that have their databases hosted on the same database server.

As shown in the following graph, under the same load, RPS decreases when an additional service application is added. It is expected that RPS will decrease when additional service applications are added.

Latency for most operations is not significantly affected when additional service applications are added. However, unlike other operations, the "get suggestions" operation interacts with all available service applications. Therefore, latency for this operation increases as the number of service applications increases, as shown in the following graph. It is expected that this trend will continue as the number of service applications increases.

As shown in the following graphs, database server CPU usage increases significantly when there are two service applications that have databases residing on the same server, but memory usage is not significantly increased.

If you must maintain more than one managed metadata service application, make sure that latency for keyword suggestion operations is at an acceptable level. Note that network latency also contributes to total effective latency. We recommend that managed metadata service applications be consolidated as much as possible.

If a single SQL Server computer is used to host all service applications, the server must have enough CPU and memory resources to support acceptable performance targets.

This section shows the performance characteristics of two timer jobs in the managed metadata service: the Content Type Subscriber timer job and the Taxonomy Update Scheduler timer job. Both timer jobs enumerate the site collections in a given Web application, and can potentially run for a long time and consume significant system resources in a large farm.

The Content Type Subscriber timer job distributes the published content types to all appropriate site collections of a Web application. The overall time that this timer job takes to run depends on many factors, such as the number of content types that need to be distributed, the number and type of fields in the content type, and the number of site collections. This test shows how the following scaling factors affect the overall time to distribute a content type:

The number of site collections in a Web application

The number of content types

The first test was done by publishing 10 content types and distributing them to a different number of site collections. As shown in the following graph, the relationship between time to distribute content types and the number of site collections is almost linear.

In this test, one content type was published to 1,000 site collections, and then ten content types were published to 1,000 site collections. The distribution time for ten content types is approximately 10 times the distribution time for one content type, again showing an almost linear increase.

Test results show that the average time for a single content type to be distributed to a single site collection is almost a constant. Therefore, it is safe to run this timer job on a large collection of site collections. You can use the average distribution time to estimate how long the timer job will take to execute, given the number of site collections and the number of content types to distribute. If those numbers are extremely large, you might find it takes hours or even days to run the timer job. Nevertheless, you can pause and resume this timer job, and content type publishing is not a frequent activity.

Note that the time that is required to execute this timer job can increase significantly if a content type pushdown occurs during the timer job, especially if many lists are involved. For more information about content type pushdown, see the Managed Metadata Connections section in the Managed metadata service application overview (SharePoint Server 2010).

Tip

When you try to publish a very large content type, you might see the following error:WebException: the request was aborted.
The cause is that the size of the content type exceeds the 4 MB default maximum HTTP request size for the service application. To prevent this error, you can increase the maximumRequestLength value in the web.config file for the service application.

The Taxonomy Update Scheduler timer job keeps the hidden taxonomy list on every site collection of a Web application in sync with the term store. The overall time that this timer job takes to run depends on the number of items that need to be updated and the number of site collections that contain updated items. This test shows how the size of the hidden list and the number of site collections in the Web application affects the average update time of a single item for a site collection.

The following graph shows the relationship between the number of site collections and the average time to update one term in one site collection.

As shown in the following graph, the average time to update one term in one site collection increases slightly as the size of the hidden list increases.

An increase in the number of site collections does not have a significant effect on the average time to update a term in a site collection. Therefore, it is safe to run this timer job on a Web application that has a large number of site collections. You can estimate the overall execution time of the timer job by multiplying the average time to update a term in a site collection by the number of site collections and the average number of updated terms in each site collection. You can also pause and resume this timer job.

The size of the taxonomy hidden list increases over time as more and more terms are used by the site collection. The timer job might take longer to execute as the hidden list grows in size.