So I am building a webpart which enumerates all doclibs in the current web, grabbing all checked out files and listing them in a table, allowing a user to apply some required metadata and check in many at once (50).

This webpart must operate on lists over the list view threshold (currently set at 15,000 in production), and also on sites that are VERY large (50,000 - 100,000+ documents).

Using an SPquery (with no CAML defined at all), retrieving pages of 2,000 items and parsing that way. The problem with this is, that the webpart is actually causing a timeout to occur on those very large (50k+) sites. So I'm trying to be a bit smarter with my CAML, pulling in only items that are checked out:

I'm using a CrossListQueryInfo query to query the whole web with this same caml, and it works wonderfully when no list is over the LVT. If one is, I catch that exception and re-try with the 'slower' SPQuery on each individual library.

From everything I am reading, as long as my CAML is returning less items than the LVT, it should work. But using the CAML above causes the error The attempted operation is prohibited because it exceeds the list view threshold enforced by the administrator to be thrown when SPList.GetItems(spQuery) is called. Since I'm setting a rowlimit of 2,000, shouldn't that never happen? MS Suggests to execute an SPQuery with no CAML defined at all - basically grabbing all items from the library in pages of 2,000. So I can't make any sense of why my CAML is failing only on lists over the view threshold.

Does the 'CheckoutUser' field need to be indexed on every list we want to execute this query against?

Update 2: This comes down to the 'CheckoutUser' field not being indexed, and trying to query against it. Unfortunately it is not an option for us to go out and force this index on every library in the farm. I believe my only option at this point is to implement some sort of paging scheme on very large sites to process items in batches.

Final Update: As a solution, I've decided to enforce indexing the 'CheckoutUser' column for libraries. This should vastly improve the overall performance of the webpart, and allow support for very large sites. There will be a bit of headache immediately following deployment, as we will need to manually set the column index on lists over the list view threshold, but in the long run this will be best.

2 Answers
2

My understanding of how the List View Threshold works is limited, however, I suspect the reason your CAML query is failing is because it is filtering on the non-indexed field, and the RowLimit is applied "after".

I think that indexing the CheckoutUser field would solve your problem, however it sounds like you have a lot of sites.

Allow me to suggest an alternate approach - paging. Since the ID column on every list is indexed, implement a CAML query where you are filtering FIRST (very important) on the ID column less than, say, 2000, and secondly on the CheckoutUser field. If that query does not return the desired number of results, then increase 2000 to 4000 and repeat.

Yes, indeed it is the non-indexed field causing an issue. By indexing the 'CheckoutUser' field, the ContentIterator succeeds. You have a good idea. My last resort was going to remove the SPQuery altogether and put paging right into the webpart, but I think we can do it more elegantly by executing multiple queries against the ID field. Basically just keep executing and incrementing the ID range by 2,000 until a query returns 0 results.
–
SeanMar 20 '13 at 18:34

Would you be able to briefly explain to me how to implement a CAML query in such a way that ID is filtered first, and then Checkout User second?
–
SeanMar 21 '13 at 13:08

Just make the ID filter first in the CAML.
–
ChloraphilMar 21 '13 at 13:20

I agree that indexed columns do seem to be the issue here, the problem is that this webpart will be used across an enterprise environment consisting of 70,000+ users and 15,000+ site collections - we can't really go around automatically indexing the CheckoutUser column on user sites. Thank you for the link to that blog post. I am still fairly new when it comes to working with large lists.
–
SeanMar 20 '13 at 19:23