Topics

Featured in Development

Peter Alvaro talks about the reasons one should engage in language design and why many of us would (or should) do something so perverse as to design a language that no one will ever use. He shares some of the extreme and sometimes obnoxious opinions that guided his design process.

Featured in AI, ML & Data Engineering

Today on The InfoQ Podcast, Wes talks with Katharine Jarmul about privacy and fairness in machine learning algorithms. Jarul discusses what’s meant by Ethical Machine Learning and some things to consider when working towards achieving fairness. Jarmul is the co-founder at KIProtect a machine learning security and privacy firm based in Germany and is one of the three keynote speakers at QCon.ai.

Featured in Culture & Methods

Organizations struggle to scale their agility. While every organization is different, common patterns explain the major challenges that most organizations face: organizational design, trying to copy others, “one-size-fits-all” scaling, scaling in siloes, and neglecting engineering practices. This article explains why, what to do about it, and how the three leading scaling frameworks compare.

The most common use case scenarios are around displaying and modifying SharePoint lists - unfortunately, that is also where we see many performance issues because the SharePoint Object Model is not always used in ways conducive to optimal performance.

Use Case 1: How many items are stored in a SharePoint list?

There are multiple ways to answer this question. One example that I’ve seen many times is the following:

Related Sponsor

This code gives us the number of items in our list but in order to do this it has to retrieve ALL items of the list from the content database. The following screenshot shows what is executed within the Object Model when accessing the Count property as in the code above:

For small lists this might not be a problem as the query is rather fast. But it will become a problem when lists grow over time or when the custom code has never been tested with live data.

For this scenario Microsoft offers a different property on the SPList object itself called ItemCount. The correct code for this would be:

int noOfItems = SPContext.Current.List.ItemCount;

In this case, SharePoint only needs to query a single record from the Lists table in the content database. The number of items in the list is stored there redundantly in order to get this information without the need to query the entire AllUserData table where all SharePoint list items are stored.

Use Case 2: Displaying items in a list using SPList?

There are multiple ways to iterate through the items of a SharePoint list by using the SharePoint Object Model. One approach - which I’ve seen in a live SharePoint Application - may work fine on the developer’s machine or on very tiny lists. But it is going to ruin your performance once executed on a list that exceeds a couple of hundred items. Let’s start by looking at a code snippet that can be used in a WebPart to access the first 100 items from the SharePoint list of the current context:

SPList activeList = SPContext.Current.List;

for(int i=0;i<100 && i<activeList.Items.Count;i++) {

SPListItem listItem = activeList.Items[i];

htmlWriter.Write(listItem["Title"]);

}

Assuming that there are at least 100 items in the list - how many roundtrips to the database is this code going to make in order to retrieve the 100 Title’s of the first 100 SharePoint list items? You might be surprised. It’s a total of 200 database calls as you can see from the database view when analyzing the transaction executing the above code:

The reason for that is because in every loop we request a new SPListItemCollection object when accessing the Items property. The Items property is not cached and therefore always requests all items from the database over and over again. Here is how the first loop iterations look:

The CORRECT way to do it

The correct way to do it is of course to store the Items property return value in a SPListItemCollection variable. With this the database is only queried once and we will then iterate over the result set that is stored within the collection object. Here is the changed sample code:

SPListItemCollection items = SPContext.Current.List.Items;

for(int i=0;i<100 && i<items.Count;i++) {

SPListItem listItem = items[i];

htmlWriter.Write(listItem["Title"]);

}

You can also use a foreach loop which will be compiled into a similar code leveraging the IEnumerable interface of the Items Collection. Here is now how the new loop is executed internally:

Use Case 3: Use SPQuery and SPView to only request data that you really need

One of the main performance problems that we can witness in any type of application that has to deal with data from a database is that too much data is requested. Requesting more information than actually needed by the current use case on hand results in additional

query overhead in the database to gather the requested information

communication overhead between the database and the application

memory overhead on both the database and the application

Looking back at the previous two use cases you can see that the executed SQL Statement always selected ALL items from requested SharePoint list. You can tell that by looking at the SELECT clause which says: SELECT TOP 2147483648 …

Limiting the number of returned rows

In case you only want a limited result set when accessing items in a SharePoint list you can make use of the SPQuery.RowLimit property.

Here is an example:

SPQuery query = new SPQuery();

query.RowLimit = 100;

SPListItemCollection items = SPContext.Current.List.GetItems(query);

for (int itemIx=0;itemIx<items.Count;itemIx++) {

SPListItem listItem = items[itemIx];

}

Using the SPQuery object with SPList.GetItems will result in the following SELECT clause

In the previous example we already limited the number of items that we want to retrieve. We however still request ALL columns that are defined in the SharePoint list. This might be ok in case we really need all columns to display to the end user or if we need all of them to perform some calculations. In most of the cases we however only need a few and not all.

Limiting the retrieved columns

There are two ways of limiting the columns to retrieve from the database:

Use a SharePoint View: SPView

Use the SPQuery.ViewFields property

The sample code from above can therefore be changed in the following two ways:

In both scenarios the SELECT clause will only contain those fields that are defined in the SharePoint view respectively those fields that are referenced in the ViewFields property. The following illustration shows the difference in the SELECT clause:

Use Case 4: Paging through SharePoint list items with SPQuery

SharePoint lists can contain thousands of items. We have all heard about the 2000 items limit that should not be exceeded in order to achieve good list performance. There are definitely performance impacts when exceeding this limit and there are ways to overcome this limit by using indexed columns and views.

Besides these considerations it’s also important to be smart when accessing data in the list. As already explained in the previous use cases - only accessing the data that you need can take a lot off pressure of the SharePoint Content Database. Additionally, the SharePoint Object Model provides additional features to enhance access to list items.

Paging data is a technique that we all well know from rich client applications or from web applications using e.g.: data grids. Paging allows easy navigation for the end-user and - if implemented correctly - reduces the load on the underlying database.

The SPQuery object provides the property ListItemCollectionPosition that allows you to specify the start position of your query page. RowLimit allows you to specify how many items to retrieve per page. Let’s have a look at some sample code:

SPQuery query = new SPQuery();

query.RowLimit = 10; // that is our page size

do

{

SPListItemCollection items = SPContext.Current.List.GetItems(query);

// do something with the page result

// set the position cursor for the next iteration

query.ListItemCollectionPosition = items.ListItemCollectionPosition;

} while (query.ListItemCollectionPosition != null)

SPList.GetItems takes the query and only returns 10 items each time GetItems is called. The SPListItemCollection offers the property ListItemCollectionPosition which acts like a cursor on the SharePoint list. This property can be used for any further page iteration to define the starting point of the next page. The following illustration shows the database activities:

A closer look at one of the SQL Statements shows us that the combination of the SELECT TOP and WHERE clause is used to retrieve the items of a certain page:

Use Case 5: Updating a large number of SharePoint list items

The previous use cases focused on read-access of items stored in SharePoint lists. Now its time to discuss how best to update or add new items. As the SharePoint Object Model provides a wide variety of interfaces we can again choose between multiple approaches.

The first obvious way of adding or updating items in a SharePoint list is SPListItem.Update. You can obtain the list item by either querying an existing item or by adding a new one via SPListItemCollection.Add.

Let’s have a look at the following sample:

for (int itemIx=0;itemIx<100;itemIx++) {

SPListItem newItem = items.Add();

// fill all the individual fields with values

newItem.Update();

}

Analyzing this code shows us that EVERY call to the Update method actually calls the internal method SPListItem.AddOrUpdateItem which in fact calls a stored procedure to perform the task:

We can see that adding 100 items to my list took a total of 4.5 seconds.

Using batch updates instead of individual updates

If you have to update a larger number of items it’s highly recommended to not use the Update method on every item. Instead - use the batch update function ProcessBatchData provided by SPWeb.

Adding the same 100 items via ProcessBatchData and analyzing the internals shows us how much time is spent in the update:

Comparing both update approaches shows that we have a significant performance improvement with the batch update:

Caution

Batch updates are really recommended when doing larger updates. But please consider the overhead of creating the batch update xml:

Make sure you use a StringBuilder and not individual string objects that get concatenated.

Split up batch update calls to keep the generated XML small enough to not run into out of memory exceptions. I ran into an OOM when executing the above example with 50000 batch updates

As an alternative you can also use the UpdateListItems method of the Lists Web Service.

Use Case 6: Which are my slowest lists, how are they used and why are they slow?

We all know that SharePoint list performance can degrade with the more items stored in the lists, and depending on how the lists are filtered when viewed. You will find many articles and blog entries talking about the 2000 items per list limit. The 2000 items however are not the real problem - you can in fact have many more items in the list. It all depends on how those lists are viewed. The question therefore is: How can we identify which lists have degraded in performance and how are they commonly used?

In order to do the correct performance correcting actions we first need to understand the current usage scenarios and analyse the resulting performance problems.

The easiest way monitor list performance is by analyzing the http response times to the respective SharePoint List and SharePoint View URLs. A SharePoint URL has the following format: http://servername/site/{LISTNAME}/{VIEWNAME}.aspx.

In order to analyze it we can group the requests based on those two indicators. I used dynaTrace’s Business Transaction Feature to group the captured PurePath’s according to a regular expression. The following images show the usage scenario of my lists and views:

These results give us a good indication about which lists and views are used more frequently and how well they perform.

Additionally to analyzing the HTTP Requests - which only provides accurate data for pages that display the specific list or view - we can analyze the list usage of custom web parts or custom pages that access more than just a single list or view or that access a list in a special filtered way.

We can do that by analyzing the interaction with the SharePoint object model like SPRequest.RenderViewAsHtml which is used to render lists and views or access to the SPList and SPView. The following illustration shows usage and performance metrics based on SPRequest method invocations:

The illustration above shows us the internal GUIDs of the Lists. Each list and view is uniquely identified with a GUID. There are different ways to find out the actual list name: You can paste the GUID in the URL to edit the settings of the list or view. Here is an example:

http://servername/_layouts/listedit.aspx?List={GUID} (GUID must be URL-Encoded).

The other option would be to open your content database and query the table AllLists. This table contains both the GUID and the List name.

Why are certain lists slower?

Now as we know which lists and views are accessed frequently we can focus on those that show signs of performance degradation. In order to improve the end user experience we should focus on lists that are accessed most frequently rather than lists that are accessed just occasionally.

There can be multiple reasons why lists perform slowly:

Too many items are displayed in the list view

Too many items in the list without using filters and indexed columns

Inefficient data access by custom web parts

Conclusion

The SharePoint object model provides an easy and flexible way to extend SharePoint Applications. The framework offers different mechanisms to access and modify the data stored in SharePoint lists. Not every possible approach, however, is good for every use case scenario. Knowing the internals of the SharePoint Object Model allows creating SharePoint Applications that perform well and scale.

About the Author

Andreas Grabner works as a Technology Strategist for dynaTrace Software. In his role as a member of the R&D department, he influences the dynaTrace product strategy and works closely with key customers in implementing performance management solution for the entire application lifecycle. Andreas Grabner has 10 years experience as architect and developer in the Java and .NET space.

About the Author’s employer: dynaTrace software

dynaTrace is the leader in continuous application performance management across the lifecycle for business critical Java and .NET applications. dynaTrace is the only solution that provides a common, integrated performance management platform for all key stakeholders – development, test and production. Industry leaders such as UBS, LinkedIn, EnerNOC, Fidelity, and Thomson Reuters use dynaTrace’s patent pending technology to gain full visibility into application performance, identify problems sooner and dramatically reduce the mean time to repair by as much as 90%. These and other leading companies rely on dynaTrace to proactively prevent performance problems from happening and quickly resolve those that do occur – saving time, money and resources.

Screenshots

Your message is awaiting moderation. Thank you for participating in the discussion.

Nice article. Honestly though, most of these things would be obvious if the Sharepoint object model followed the .NET Framework Design Guidelines... the fact that accessing a public property causes so much side effects and is so performance intensive (like the .Items property) is a pitfall that is warned about over and over. People WILL use public properties in loops, and during a code review it isn't obvious.

If that was not available, and only GetItems was, it would be obvious to most seasoned developers that performance impacts may be involved and to not use it repeatedly.

Anyway...the performance screenshots displayed in the first part of the article... are they from dynaTrace too, or another tool?

Re: Screenshots

Your message is awaiting moderation. Thank you for participating in the discussion.

If done right, SP would be built on top of an O/R mapper. It could then just cache those lists internally and return them the next time you'd access the property. But things being what they are, you're right, it should definately be GetItems(), or rather LoadItems() to be really explicit about what happens.

Btw, I guess you can tell it's dynaTrace just from the fact that it lets you drill down into method parameters. This really gives you answers where other tools just provide hints for further research.

Re: Screenshots

Your message is awaiting moderation. Thank you for participating in the discussion.

[blockquote]If done right, SP would be built on top of an O/R mapper[/blockquote]

OR/M or not, they could still be cached internally, thats not an issue. It could still be done, but it would have to be optional...some lists can be far too large (Ive seen sites with thousands of lists of hundreds of thousands of items...), so it couldn't be an "always" thing, it would have to be optional, and then things would get complicated.

Just following the .NET framework design guidelines, so that things in SP behave like they do in everything else would be more than sufficient, honestly. Not just for performance, but for development efficiency, and even for SP's developers themselves, it would be a decent bit easier.

Oh well! Until then, there's posts like these to enlighten us small folks :)

Re: Screenshots

Your message is awaiting moderation. Thank you for participating in the discussion.

"they could still be cached internally"

yes, but since SharePoint has no concept of transactional behavior, that would mean that the cache might not be up to date with the information in the data store. One way to solve this is to leave all caching to the application developer, like they do now - only then it would have to be named GetItems() like you already said. This is pretty hard to get right on an app by app basis, but at least then it's not SP's fault ;-)

Another way would be to have a sensible concept of caching and cache invalidation, which is something that O/R mappers usually have. (Of course, these concepts could be implemented in SP without an O/R mapper, it just makes little sense to do that.)