Recently, I was invited to a customer who had started to use Inventory Closing functionality about a month ago. They were using DAX2009 with recent rollup 7. Every time when they were running inventory closing, the system locked all (or at least – most of the) inventTrans records, safely preventing users from any regular work with logistics module. Before I came to the customer I already heard rumors that new functionality of non-financial transfer closing can cause long duration locks to inventory data. Now I had a chance to witness it with my own eyes. The sql statement which was a reason for show-stopper problem in IC was very easy to find. (I knew the place in the code before I went to a customer’s site). It was the following query from InventCostNonFinancialTransferHandler.updateInventTrans() method:

Looks impressive, hugh ? If you explode nested views, you find out that this query is joining 10 or 11 tables. (I lost count to be honest). After I started to look onto the query, I understood that from application logic point of view, the query is Okey. It must update all non-financial transfers for a given Dimension Group in one update. It should not lock users from current operations (e.g – reserving stock for sales orders or posting picking list), because it should update only financial transactions from period to be closed. I made an assumption, that maybe this query is just causing lock escalation from record level to page level and then to table level, thus causing total lockup. I tried to disable lock escalation for inventTrans with alter table set(lock_escalation=disable) statement. (Just for a test; I think it would be a bad idea in general). It did not help. Then I tried to analyze lock information by selecting data from sys.dm_trans_locks table. I found out that this statement placed U-lock over seemingly every page of the inventTrans (Or, at least, all pages with given dataareaid).

I never had a lot of experience with U-locks before. I saw them in a database from time to time, but I red somewhere in SQL BOL that it is a kind of short duration locks issued by update statement to maintain consistency and prevent deadlocks. When I tried to find more information on update locks in internet, I ran into perfect article Read Committed and Updates by Craig Freedman. It turned out that update statement in SQL (and update_recordset statement in Axapta respectively) place U-lock ON EVERY row it read. What happens next ? If update statement has simple logic (like update table set nonindex_field=constant where another_field=constant), the system either update a row (if it fits a where condition) and replace U lock with X, either it immediately release U lock (if row does not fit to the condition). So for simple cases, U-lock is placed on a row for really short duration – from reading of a page into memory until end of the page’s in-memory scan. But in complex queries, especially in queries with self-join, U-lock is placed in the very beginning of SQL Statement execution (when the page is being read for the first time) and kept until query reached a home-run stage, when all records to be updated are defined and the query started update phase itself. Since our query in question has a join of 11 tables, even in the very best case scenario the system was taking a couple of minutes to evaluate all necessary info and proceed to update phase. In the worst case (and this worst case happens too often), the system was choosing incorrect execution plan. In this case, U-Locks persisted for 30-40 minutes. It was locking all the pages in the table, because the system was simply reading all pages with given dataAreaId (because inventTrans do not have more specific index suitable for query conditions) thus leading to a total lock-up for 30-40 minutes.

BTW, there is no much point in locking here, because these updates being called only from starting part of Inventory Closing, which is executed in single user mode. Chances that these inventTrans records would be updated but some other user activity (not IC) are pretty thin: There is no much you can do with already financially updated inventory transfer transactions. The only thing which is coming to my mind is an idea that some crazy user will start to change InventDimId of inventory transactions via Item Details->Functions->Edit dimensions function. But I think we can ignore this possibility 😉 I can understand SQL Server concerns about the data to be modified by different process, but from application logic point of view it is highly improbable.

When I identified the issue, I started to try different ways to fix it. First thing I tried to replace an update_recordset with equivalent set of statements w/o update_recordset: while select table {select forupdate table2 where table2.recId==table.recid; table2.update()} . Since outermost select does not use ‘pessimisticklock’ clause (and InventTrans has OCC enabled), it does not places any locks on table it read. Inner select use ‘forupdate’ clause, but because of OCC it is just a syntactic sugar, which does not generate any additional clauses or hints for the select statement sent to SQL Serever. This approach brought some progress: Although the query (now select, not update) was still taking 30-40 minutes to execute, it did not place these nasty U-locks over all records; IC was still slow, but it stopped to lock users at least. As a next stage, I tried to split one select onto two nested selects. For everyone who is into SQL Statement tuning it sounds like very counter-natural, but since we are unable to directly influence the query plan in X++ with hints, we must somehow ensure SQL Server selecting more or less decent execution plan. I tried the following approach: On outer level I placed a query which was simply producing a list of itemIds, which have non-closed transfers (any transfers, maybe even financial ones), in nested while select I used the copy of original select, only with adding ‘itemId==itemIdFromOuterSelect’ condition to every occurrence of InventTrans and inventSettlement in the query. This approach brings good results: Processing of non-financial transfers in my case took only 1-1.5 minutes. Maybe in your case, if you have many items with transfers, it will take about 30-40 minutes for execution of the whole logical block, but it yet much better than 30-40 minutes for one item dimension group. Well, maybe this approach is not the most perfect one. Maybe I should try to combine outermost select (to select item list) and inner update_recordset. Maybe this U-Locks won’t be so nasty after all. If the statement takes only 5-10 seconds to execute, then maybe benefits of fast execution will outweigh potential lock conflicts. But for my case, it was acceptable solution and the customer kept to use it.

As far as I understand, the whole feature was first designed for DAX2012 (where related statement is always executed per-item) and then backported to DAX2009 without too much testing. Also, to my knowledge currently MS is working on a patch for this feature in DAX2009.

Story #2: Wrong query execution plan can ruin the helper-based algorithm in Axapta

On another project, the customer often had performance issues with inventory closing cancellation.

In the beginning, it was a mere issue of total execution time. When they were running IC cancellation in interactive mode, it took around 3-4-5 hours to complete. I pointed them that to the fact, that if you execute IC cancellation in batch mode, it runs many times faster than in interactive. The trick is simple: When IC cancellation is started, it checks the mode of execution. If it is running in a batch mode, it checks which items has inventSettlement records in inventory closing being canceled, then it spawn a batch task of class InventCostClosingCancel_workInvent for every such an item. Next it iterates over all indirect cost journals created by inventory closing being canceled, and for every cost journal, it spawns a batch job of class inventCostClosingCancel_WorkProd. Then all these jobs start to execute (up to a batch server maximum number of thread at a time), thus brings significant (like 5-10 times) increase in performance. (It is worth to mention that all batch jobs are spawned with empty batch group, so you have to configure at least one of your batch servers to serve batch jobs from this batch group).

So, after they started to run Inventory Closing cancellation in a batch mode, time of the operation dropped to 45-60 minutes. (Progress percent in batch screen to advance by 1-2 percent every minute.) But from time to time, a batch mode cancellation went exceptionally slow (like 0.1-0.15% in a minute). I decided that SQL server generates silly execution plans sometimes and prescribed to run update statistics on inventTrans/inventSettlement/prodTableJour before running IC cancelation. For a while this well known home remedy against spontaneous slowness of queries helped to resolve the issue. But eventually, since 8-10 months after GoLive, even after statistics update, inventory closing cancellation become very slow, so I had to investigate the issue a bit more.

When I checked the list of connection, I found out that there is a lot of blocked processes from batch server, so it was a locking issue, not the issue of wrong execution plan. (Or at least so I decided after first look). Next I ran SQL Profiler, I also found out that the system creates several deadlock in a second. Finally I started to check the list of running queries and resources they are waiting for. I found out, that all processes was waiting on newly inserted key into inventSettlement.DateVoucherIdx or prodTableJour.dateVoucherIdx. I drew a conclusion that they are waiting on new key, because when I tried to select data from the tables using %%lockres%% pseudo-column, I was unable to find record being locked). I made assumption that a transaction insert record, then everyone wait on it and then transaction is getting rolled-back because of deadlock detection. (This was the only reasonable explanation why all connections are waiting on in-existent record.) The waiting statements were one of the following update_recordset statements from inventCostClosingCancel_workInvent.duplicateSettlements() and inventCostProdCancel.prodTableJour() respectively:

After I found and identified the places in the X++ code, which was causing deadlocks, I was finally able to understand the issue. Meaning of IC cancellation is quite simple: It simply insert exact copy of closing records, but with reversed sign. Since insert_recordset statement in X++ does not support expressions in select list, the author of the code found the simple pattern to overcome this: First the system inserts copy of old data (inventSettlement/prodTableJour) ‘as is’ (only voucher id and date are changed), next, special update_recordset statement (which supports some basic expressions in setting clause), is updating newly inserted records, inverting a value of the fields. Since this insert and update are made in 1 transaction, no other user can read our half-cooked records with un-inverted sign.

From application logic point of view, since, every helper thread is processing its own itemId or JournalId, no lock conflicts or deadlock can occur. But since an update statement in SQL (and update_recordset statement in X++) has a habit of locking EVERY record it read, locking logic suddenly becomes in-determenistic.If SQL Server use execution plan with index scan by ItemId/JournalId indexm several threads avoids locking conflicts, since every of them update its own personal set of keys. But from time, SQL Server having a bad day and it is selecting voucher+date index for both queries. In this case,1st helper thread inserts records for item A and locks the related key entries in Voucher+Date index; Then the 1st thread start to iterate over WHOLE index entries for given voucher and date (which are common for all helpers, since all inventSettlement records of the cancellation use the same voucher and date). Sooner or later, this scan was trying to read uncommitted inserts of the second helper thread, which caused 1st thread to be locked until 2nd thread commit transaction. Second thread, in turn, sooner or later starts to iterate all the index entries for the same voucher and date. Sooner or later it was hitting the locks on records inserted by 1st thread. SQL Server, after finding classical deadlock situation (2 processes are waiting on each other), terminates one of the two (or several) deadlocking connections causing all the threads by one to redo the same transaction.

In both cases, I fixed the issue by building a very specific index to prevent any potential conflicting index reads in this pieces of code. In the first case (for inventSettlement) it was an index on TransDate+Voucher+ItemId. In the second case (for prodTableJour) it was an index on TransDate+Voucher+ProdId. Since then, SQL Server always use these indexes for related update_recordset statements, thus eliminating this silly deadlock issue. (It is silly, because it was not caused by some real life conflict for the same resources. It was just a result of mis-optimization of a query in SQL Server).

If you do not believe me or just want to reproduce the 2nd issue on your own test installation, try to disable itemDateIdx index on inventSettlement and NumDateIdx index on prodTableJour table. Then try to run inventory closing cancellation in batch mode with SQL profiler running deadlock trace. Moreover, when I was preparing this article, I managed to run into the same issue on standard CONTOSO database on standard international version of DAX2009ru7. It seems that the issue can be quite frequent and even average installation can benefit from building these two additional indexes, I mentioned on the previous paragraph.

Lessons learned:

Locking in SQL Server can be in-deterministic. The set of records locked can be defined not by your application logic, but by some random fluctuation of SQL Server optimizer

Be careful with update_recordset statement.It can place locks on records, which you never expects. Think about building appropriate indexes or rewriting the query if you are updating a table with highly competitive access scenarios.

Sometimes a bad query plan can bring not only performance issues, but also concurrency issues. Do not forget to do all usual hoodoo to keep SQL optimizer happy: Update statistics regularly, rebuild index regularly and so on. If it does not help – try to rewrite a query…

Today, I will share some experience related to setting up and fine tuning MRP for best performance.

The most significant MRP-related improvement of DAX2009 over older versions is an ability to execute MRP in multi-threaded mode, when planning process is spread over several independent processes executed in parallel. As usual, multi-threaded execution works only when MRP is executed in batch mode, since helper threads are spawned as additional tasks inside the same batch job. This is the current way to support multi-threaded execution in Axapta.

Basic setup

The number of helper threads is specified on the second tab (Scheduling Helpers) of MRP start menu parameters. What number of helpers should be used ? To my experience, MRP scales very well. On my current project, we use 15 helpers (and 1 main thread). We tried to increase number of helpers to 31, but have not found any significant performance difference. Generally, it depends on configuration of your batch server(s) and your database server.When you are trying to increase the number of helper threads, you should check resource utilization for both batch server and database server. If one of these servers becomes saturated after yet another increase in helper threads number, then it is time to stop or, maybe, time to upgrade your hardware. I also want to mention, that I have positive experience of allocating MRP onto several batch servers. I would say, that my personal recommendation is as the following: First check every batch server, which will participate in MRP for the number of CPU cores it have. Then configure every batch server to run 2xNumber_Of_Cores batch threads. (In Administration->Setup->Server Configuration). Then specify number of helper as total number of batch threads for all batch servers serving your MRP batch group MINUS 1. Remember, that main batch thread also consumes one thread from your batch server group capacity. Also, if during MRP run, you are going to run some other batch processes on the same batch server(s), you may want to decrease the number even further, so maybe you should set number of helper-threads to total thread capacity minus 2 or 3.

Next important parameter to discuss is Item Distribution factor. What does it mean? On the very first stages of MRP (In Update
phase to be precise), the system allocates all items (all items in inventTable or all items in inventTable, which fit to the query specified in MRP startup dialog box) into chunks. Every chunk is a kind of unit of work, which is processed by one thread. During the phases of MRP, which are executed in parallel (Data regeneration, Coverage planning, Futures planning and Action Messages), every thread grabs chunk, process it, then grab another chunk, process it and so on, until chunks for given stage and BOM level are marked as processed.

Size of the chunk is calculated as Item_Distribution_Factor*(Number_Of_Helpers+1).

From one side, smaller chunks ensure smooth processing and even allocation of items between threads. The smaller chunk is, the less are chances for a thread to grab a chunk with high number of computational intensive items. Say, one item takes from 1 to 10 seconds to be coverage planed. If size of the chunk is only 1 item, then in the worst case, we most late thread will finish BOM level coverage only 10 seconds after the most early thread finished. If chunk size is 300 items, then in the worst case scenario, difference between time to process is 300*10Seconds-300*1Second==2700Seconds==45Minutes. It means that for this worst case scenario, there are good chances that most of helper threads will be doing nothing, waiting for 30-45 minutes for the last, unlucky thread to finish processing. It would increase planning time and it also would lead to non-optimal usage of hardware. (Since most of the threads would do nothing, while waiting for end of BOM Level). From other side, allocation of the chunk to a thread is a competitive process, which leads to temporary database locks. Several threads often try to allocate the same chunks in parallel; Only one of these threads succeed, while others repeat allocation process until they grab their own chunk. So, chunk allocation can become a bottleneck itself if number of chunks is too high and size of the chunk is too small. To my experience, reasonable size of the chunk is somewhere between 10 and 60. To find out the optimum chunk size and distribution factor, you can simply make several test with different distribution factors. Also, you can check ‘Track Item Process Duration’ checkbox in MRP Dialog parameters and then check typical item planning time in Master Planning->Inquiries->Unfinished Scheduling Processes->Inquiries->Item Process Duration. If item process duration varies a lot, you can benefit from smaller chunk size; if it does not, then probably increase of chunk size can be more beneficial.

Another potential way to improve performance (I never tried it though) is to put randomly-ordered items into a chunk. Now, when the system creates chunks, it simply iterates over inventTable ordered by itemId, so a chunk contain items with similar itemId. Since, usually, items with the same complexity of planning often has sequential itemIds, it often leads to uneven of distribution of items between chunks. Some chunks consist from items which are regular purchased items, while other chunks consist of complex BOM-items, which require complex resource planning on many work center groups. If you add special MRPOrder field to inventtable, fill it with random number during item creation and then sort by this field during chunk generation, you can have more even distribution of items between chunks.

Infant mortality issue

The frequent problem of parallel MRP run is early termination of helper threads. Say, we started our MPR as usual, but then in 15 minutes, we see that all our helper batch tasks terminated withthe strange message “Nothing to process”. (You can see this message, if you click Log button in Batch Task form before main thread of MRP-batch terminate). Then the only remaining main thread continues to run MRP in single-thread mode (very slowly).
Here is what is happening:

Main thread spawn helper threads and continues it work

Helper threads continuously check status of process waiting for it to advance to “Deleting and inserting data”

Main thread deletes net requirements and planned orders for plan being regenerated. (and it takes a lot of time usually)

Helper threads, after waiting for 15 minutes for status update, terminates with “Nothing to process” message

Main thread runs MRP in single thread mode.

First of all, the standard logic of MRP has obvious bug in helper thread behavior. While waiting for status update from main thread, the system keeps reading process status table (reqProcessList) continuously,without any pauses between executing SELECT statements. It is easy to figure out, that, say, 16 threads, issuing 70-100 statements in a second will send to DB server about 1500 statements in a second. This amount of queries can easily consume an average DB Server’s capacity, so it simply won’t have a time for useful work (like deletion of net requirements in main thread). To prevent the system from this behavior, you need to modify method run() of ReqProcessExternalThread class. It contains the ill-fated status waiting loop. You should find the end of while(true){} loop and
insert into it sleep(500); statement. This will add 0.5 seconds delay between re-reads of the process status, thus giving to your database server a time to breathe.

Another issue to fix is low timeout value for helper threads. If you have realistically large working set of net requirements, deletion of the plan would take much more time then 15 minutes, expected by developers of the functionality. To increase timeout, you should open class declaration part of the mentioned class (reqProcessExternalThread) and modify definition of WAITFORPROCESSSTATUSUPDATE macros. Try to change the value of macros from 15 to at least 60 or maybe even 90.

Slow end-of-level processing.

Often, you can see in the Unfinished Scheduling Process form, that during coverage phase, in the end of BOM Level he system works very slowly. It looks like it almost hanged for 30-40 minutes, but then, suddenly it advances to the new level. If you start AOS tracing on the batch server, you will see that all helper threads are doing nothing (they simply rereading process status waiting for advance to the next BOM level), while one unlucky thread is busy calculating coverage for portion of items. Also, in Unfinished Scheduling Process form, you can realize that the number of items on the level decreases ! 10 minutes ago you
had more then 15000 items on the level, now it is slightly less then 15000, and in a few minutes it will be less then 14500.

What’s happening ?

The main reason for the issue is incorrect BOM Level data in inventTable. When the system allocates items between chunks, every
chunk has attached BOM Level. But sometimes, during explosion of BOM for,say, 5th level, the system finds as a component of
exploded BOM the item, BOM Level of which (according to inventTable) is 2. Coverage for this item was already created
already. How does the system calculates coverage info for this item again ?

The line for the item in ReqProcessItemListLine (It is a list of lines in a chunk) is marked as ‘IsLevelChanged’

The system finds of create special ‘Spare’ chunk for items, which were pushed to the next level because of sudden level
change. This chunk’s line in reqProcessItemList is marked as ‘Spare’.

The system reassign the line for the item in ReqProcessItemListLine to Spare Chunk

During processing of next BOM Level, the system checks the item for being ‘pushed’ from level from level. If level for item was changed, the system deletes planned orders for item, previously created during planning on incorrect BOM Level.

The main performance issue of this approach is that the systemdoes not control the size of ‘Spare’ chunk at all. I witnessed
cases, when size of this chunk exceeded 1000 items. If planning of every item takes,say,4 seconds, then unlucky thread, which managed
to grab this chunk will spend more then 1 hour for processing of it. Since this chunk has highest number for a BOM level, it is usually
grabbed by some unlucky thread in the very end of BOM Level processing. Then one thread is processing this chunk for an hour, while all other threads are doing nothing, simply waiting for our unlucky thread.

The very first idea of how to fix the issue which came to the mind is to calculate BOM levels before every full regeneration planning. Unfortunately – it does not always help. Sometimes, we have incorrect BOMLevel info in inventTable only because we changed out BOM structure without recalculating BOM levels. But often, the reason for level change is that real life nesting of items in production BOMs does not fit to theoretical BOM structure described in master-data BOM. Say, we have a shining brass-head bolt, which is used only to screw a label with the name of our company to the final assembled good. Naturally, after calculation, it will get BOM Level of 1. Say, then, during production of a deeply nested sub-BOM of our finish good, someone decided to use the same bolt for screwing much smaller label to one of the sub-components. He dropped the line with standard bolt from production order’s BOM and added new line with the shining brass-head one.(Maybe it is not very good practice, but you simply can not update standard BOM structure for every possible small change, requested by a customer). If you recalculate BOM Levels, this bolt will still have BOMLevel 1. But if you try to run MRP, you would find out that it coverage actually is performed in level 5 or 6. By the way, one of the very last stages of MRP is BOMLevel update. During this stage, the system updates InventTable with actual BOM Level data, gathered during MRP processing, not with theoretical BOM Level from master-data BOM Structures.

So, the only way to resolve the issue, is to change behavior of the system to create many smaller Spare Chunks for a level, instead of one large Spare Chunk. To accomplish this, you need to modify method getSpareListNum() of ReqProcessItemList table:

The only strange thing in this piece of code is a usage of random number generator. I use it to decrease the number of potential conflicts, when several threads is trying to allocate new Spare Chunk, leading to duplicate key exception.