Category: Catalog

This is something you don’t do daily, but you will probably need one day, so it might come in handy.

Recently we got a question on how to update the code of all entries in the catalog. This is interesting, because even thought you don’t update the codes that often (if at all, as the code is the identity to identify the entries with external system, such as ERPs or PIMs), it raises a question on how to do mass update on catalog entries.

Update the code directly via database query. It is supposedly the fastest to do such thing. If you have been following my posts closely, you must be familiar with my note regarding how Episerver does not disclose the database schema. I list it here because it’s an option, but not the good one. It easily goes wrong (and cause catastrophes), you have to deal with versions and cache, and those can be hairy to get right. Direct data manipulation should be only used as the last resort when no other option is available.

If you have been working with SQL Server, you would likely know the importance of indexes. Without proper indexes, your databases will not be able to utilize it’s true potential and your sites/applications will almost certainly succumb when the data size is big enough, and/or the load is high enough.

Of course you would not want to happen.

But adding a proper index is not an easy task. Finding a good index which is already a difficult task. Making the index works in 99.99% of the case is a real challenge.

It is even more so for a framework. We can’t really control what data is put in the table. We can enforce some rules, yes, but those are just not enough, and we certainly not want to limit the extensibility and flexibility of the system.

A good index has (or needs to have) several characteristics, and two of them are selectivity and distribution. The first one is quite obvious. Adding index to a “Status” column where you likely have 5 or 6 values across 100 thousands of rows is almost always a bad idea. That index does not really help the queries on that column, while introduces more overhead on insert, update and delete. But even if a column has a very good selectivity, it is still tricky when it comes to distribution.

Recently I was asked to help a customer with a performance problem (as usual). They saw a very high CPU usage on SQL Server instance, and ​ecfVersion_ListMatchingSegments is to blame.

Well, it’s indeed fairly long, but you just need to focus on this statement:

1

v.SeoUriSegment=@SeoUriSegment

It looks innocent enough, right? Just a filter statement on an indexed column. What can be wrong about it?

It’s not something wrong with the statement, it’s something wrong with the index, or, more precisely, the distribution of the data in the column the index covers. As a matter of fact, SeoUriSegment allows null and empty values.

A mistake. And rather a big one.

The database I looked into has around 180k rows where SeoUriSegment is empty. And that’s disaster waiting to happen. In bad case, SQL Server will cache the plan which uses index scan on SeoUriSegment for every call to ecfVersion_ListMatchingSegments, which is much less effective than an index seek. In worst case, it will use the index seek on every call with SeoUriSegment is empty. 180k index seeks per query would bring your SQL Server to its knees and effectively kill your website performance.

If you recall, I talked about a more or less same problem before, here and here . You might be surprised, it was about the wrong order of join. Now it’s about the wrong kind of execution plan! But it’s all because of same root cause: SQL Server tries to be smart and helpful, by optimizing the query the way it sees the best, only to be fooled by the data (or the distribution of it), and choose a poorly optimized execution plan.

It’s smart, not just enough in every case.

Now a big question still remains: how can the problem be fixed?

Well, we always should fix problem at its root. In this case, the fault is in the data, so we should fix them by adding the real values to the empty/null rows. The best value for SeoUriSegment should be the normalized name, so a product named “Crochet Playsuit” should have SeoUriSegment to be “croched-playsuit”, while “Příliš žluťoučký kůň úpěl ďábelské ódy” should be “prilis-zlutoucky-kun-upel-dabelske-ody”

But fixing the data can take time, and if your site is affecting by the problem, it’s probably better to have an immediate solution, by turning on the RECOMPILE option for ecfVersion_ListMatchingSegments

The bug is fixed to ensure no empty value value will be added to SeoUriSegment when copying a not published content version. But Episerver is in a difficult position now. We can’t just go ahead and add a non null/empty constraint on that column, because that can (and will) blow up existing implementation. We can forcefully update the column with null/empty value in the upgrade script before enabling such constraint, but that would be a bold move.

Morals of the story:

An indexed column should not allow null or empty values. Yes there is no guarantee that the values can’t be duplicated multiple times, but in such case it’s easier to shift the blame to the ones who inserted the data ;).

This stored procedure was previously the subject of several blog posts regarding SQL Server performance optimizations. When I thought it is perfect (in term of performance), I learned something more.

Recently we received a performance report from a customer asking about an issue after upgrading from Commerce 10.4.2 to Commerce 10.8 (the last version before Commerce 11). The job “Publish Delayed Content Versions” starts to throw timeout exceptions.

This scheduled job calls to a ecfVersion_ListFiltered to load the content versions which are in status DelayedPublish, it looks like this when it reaches SQL Server:

This query is known to be slow. The reason is quite obvious – Status contains only 5 or 6 distinct values, so it’s not indexed. SQL Server will have to do a Clustered Index Scan, and if ecfVersion is big enough, it’s inevitably slow.

A while back, we had this question on World. It’s not uncommon to update the catalog data by an external system, mostly from a PIM – Product information management system. In such cases, it might not make senses to enable editing in Catalog UI. You might need the new UI for the other parts, such as Marketing UI, but you wouldn’t want the editors to accidentally update the product information – because those would be lost, anyway.

If you are using IContentLoader.GetChildren<T>(ContentReference), one important thing to remember is this uses the current preferred language. Normally when you get children of a catalog, or a node, that would not be a problem, because a catalog entity – node or entry, will be available in every language supported by the catalog. So if you just want to get the children references, the language is not important. (Note that, if you just need the children references, IRelationRepository should be a faster, more lightweight way to go, but that’s another story). If you want to get children in a specific language – which is the most common case, you know that you can use the other overload of GetChildren<T>(ContentReference, ILanguageSelector) , where you can specify the language you want to load.

I have some more time to revisit the query now, and I realized I made a “small” mistake. The “optimized” query is using a Clustered Index Scan

So it’s not as fast as it should be, and it will perform quite poorly in no cache scenario (when the buffer is empty, for example) – it takes about 40s to complete. Yes it’s still better than the original one, both in non cached and cached cases. But it’s not good enough. An index scan, even cached, is not only slower, but also more prone to deadlocks. It’s also worse in best case scenario, when the original one can use the proper index seek.

I said this, and I will repeat it here: SQL Server optimizer is smart, and I can even say that, generally, it’s smarter than you and me (I have no doubt that you are smart, even very, very smart 🙂 ). So most of the cases, you leave it to do whatever it thinks is the best. But there are cases SQL Server optimizer is fooled by the engine – it gets confused and chooses an sub-optimal plan, because it was given wrong, outdated, or incorrect information. That’s when you need to step in.

(As a side note, I don’t answer direct questions, nor provide any personal support service (I would charge plenty for that 😉 ). I would appreciate if you go through World forums, or contact Episerver Developer support service. There are several reasons for that, including knowledge sharing, and work item tracking. I can make exceptions when I know the problem is highly urgent and is hurting your business, by jumping into it sooner than I’m expected to/before it’s escalated through several level of supports. But all in all, it should be registered with Deverloper support. We at development team are supposed to be the final line of support, not the front line. )

Recently I worked on a support case where a customer reported deadlocks and timeout exceptions on queries to a specific table – NodeEntryRelation. Yes, it was mentioned in this post. However, there is more to it.

Keeping the indexes healthy definitely help to improve performance and avoid deadlocks and timeout exceptions. However it can only work to a limit, because even if the indexes are in their perfect state (the fragmentation level is 0%), the query will still take time.

Looking in the query we talked about – ecf_Catalog_GetChildrenEntries – what does it do. It lists the entries which are direct children of a catalog. So normally entries belong to categories (nodes), but it’s possible (Although not recommended) to have entries that belong directly to a catalog.

which is very interesting to me. I can see this is a real scenario – and even quite common. When a price become obsolete, you want your contents to be reindexed so the next time you query, the search result will be returned correctly. But how?